ReAct Agent
Understanding the ReAct Agent system in DroidRun
π€ ReAct Agent
DroidRun uses a ReAct (Reasoning + Acting) agent to control Android devices. This powerful approach combines LLM reasoning with concrete actions to achieve complex automation tasks.
π What is ReAct?
ReAct is a framework that combines:
- Reasoning: Using an LLM to interpret tasks, make decisions, and plan steps
- Acting: Executing concrete actions on an Android device
- Observing: Getting feedback from actions to inform future reasoning
This loop of reasoning, acting, and observing allows the agent to handle complex, multi-step tasks on Android devices.
π The ReAct Loop
Goal Setting
The user provides a natural language task like βOpen settings and enable dark modeβ
Reasoning
The LLM analyzes the task and determines what steps are needed
Action Selection
The agent selects an appropriate action (e.g., tapping a UI element)
Execution
The action is executed on the Android device
Observation
The agent observes the result (e.g., a new screen appears)
Further Reasoning
The agent evaluates progress and decides on the next action
This cycle repeats until the task is completed or the maximum number of steps is reached.
π οΈ Available Actions
The ReAct agent can perform various actions on Android devices:
πΈ Vision Capabilities
When vision mode is enabled, the ReAct agent can analyze screenshots to better understand the UI:
This provides several benefits:
- Visual Context: The LLM can see exactly whatβs on screen
- Better UI Understanding: Recognizes UI elements even if text detection is imperfect
- Complex Navigation: Handles apps with unusual or complex interfaces more effectively
π Token Usage Tracking
The ReAct agent now tracks token usage for all LLM interactions:
This information is useful for:
- Cost Management: Track and optimize your API usage costs
- Performance Tuning: Identify steps that require the most tokens
- Troubleshooting: Debug issues with prompt sizes or response lengths
π§ Agent Parameters
When creating a ReAct agent, you can configure several parameters:
π Step Types
The agent records its progress using different step types:
- Thought: Internal reasoning about what to do
- Action: An action to be executed on the device
- Observation: Result of an action
- Plan: A sequence of steps to achieve the goal
- Goal: The target state to achieve
π‘ Best Practices
- Clear Goals: Provide specific, clear instructions
- Realistic Tasks: Break complex automation into manageable tasks
- Vision for Complex UIs: Enable vision mode for complex UI navigation
- Step Limits: Set reasonable max_steps to prevent infinite loops
- Device Connectivity: Ensure stable connection to your device
- Token Optimization: Monitor token usage for cost-effective automation