πŸ€– ReAct Agent

DroidRun uses a ReAct (Reasoning + Acting) agent to control Android devices. This powerful approach combines LLM reasoning with concrete actions to achieve complex automation tasks.

πŸ“š What is ReAct?

ReAct is a framework that combines:

  • Reasoning: Using an LLM to interpret tasks, make decisions, and plan steps
  • Acting: Executing concrete actions on an Android device
  • Observing: Getting feedback from actions to inform future reasoning

This loop of reasoning, acting, and observing allows the agent to handle complex, multi-step tasks on Android devices.

πŸ”„ The ReAct Loop

1

Goal Setting

The user provides a natural language task like β€œOpen settings and enable dark mode”

2

Reasoning

The LLM analyzes the task and determines what steps are needed

3

Action Selection

The agent selects an appropriate action (e.g., tapping a UI element)

4

Execution

The action is executed on the Android device

5

Observation

The agent observes the result (e.g., a new screen appears)

6

Further Reasoning

The agent evaluates progress and decides on the next action

This cycle repeats until the task is completed or the maximum number of steps is reached.

πŸ› οΈ Available Actions

The ReAct agent can perform various actions on Android devices:

πŸ“Έ Vision Capabilities

When vision mode is enabled, the ReAct agent can analyze screenshots to better understand the UI:

agent = ReActAgent(
    task="Open settings and enable dark mode",
    llm=llm_instance,
    vision=True  # Enable vision capabilities
)

This provides several benefits:

  • Visual Context: The LLM can see exactly what’s on screen
  • Better UI Understanding: Recognizes UI elements even if text detection is imperfect
  • Complex Navigation: Handles apps with unusual or complex interfaces more effectively

πŸ“Š Token Usage Tracking

The ReAct agent now tracks token usage for all LLM interactions:

# After running the agent
stats = llm.get_token_usage_stats()
print(f"Total tokens: {stats['total_tokens']}")
print(f"API calls: {stats['api_calls']}")

This information is useful for:

  • Cost Management: Track and optimize your API usage costs
  • Performance Tuning: Identify steps that require the most tokens
  • Troubleshooting: Debug issues with prompt sizes or response lengths

🧠 Agent Parameters

When creating a ReAct agent, you can configure several parameters:

agent = ReActAgent(
    task="Open settings and enable dark mode",  # The goal to achieve
    llm=llm_instance,                           # LLM to use for reasoning
    device_serial="DEVICE123",                  # Optional specific device
    max_steps=15,                               # Maximum steps to attempt
    vision=False                                # Whether to enable vision capabilities
)

πŸ“Š Step Types

The agent records its progress using different step types:

  • Thought: Internal reasoning about what to do
  • Action: An action to be executed on the device
  • Observation: Result of an action
  • Plan: A sequence of steps to achieve the goal
  • Goal: The target state to achieve

πŸ’‘ Best Practices

  1. Clear Goals: Provide specific, clear instructions
  2. Realistic Tasks: Break complex automation into manageable tasks
  3. Vision for Complex UIs: Enable vision mode for complex UI navigation
  4. Step Limits: Set reasonable max_steps to prevent infinite loops
  5. Device Connectivity: Ensure stable connection to your device
  6. Token Optimization: Monitor token usage for cost-effective automation