DroidAgent
Understanding the DroidAgent system in DroidRun
π€ DroidAgent
DroidRun uses a powerful DroidAgent system that combines LLM-based reasoning and execution to control Android devices effectively.
π Core Components
The DroidAgent architecture consists of:
- Planning System: Optional planning capabilities provided by PlannerAgent
- Execution System: CodeActAgent for executing tasks
- Tool System: Modular tools for Android device control
- Vision Integration: Optional screen analysis capabilities
π Execution Flow
Goal Setting
The user provides a natural language task like βOpen settings and enable dark modeβ
Planning (With Reasoning)
If reasoning=True, PlannerAgent breaks down the goal into smaller tasks
Task Execution
CodeActAgent executes each task using the appropriate tools
Result Analysis
The agent analyzes results and determines next steps
Error Handling
Failed tasks can be retried or the plan adjusted
π οΈ Available Tools
The DroidAgent has access to these core tools:
πΈ Vision System
The vision system enhances the agentβs capabilities:
Benefits include:
- Visual Analysis: Screen content understanding
- UI Element Detection: Accurate element location
- Error Verification: Visual confirmation of actions
- Complex Navigation: Better handling of dynamic UIs
π― Planning System
When reasoning is enabled, the agent uses advanced planning:
Features:
- Step Planning: Break down complex tasks
- Error Recovery: Handle unexpected situations
- Optimization: Choose efficient approaches
- Verification: Validate results
π Tracing Support
The tracing system helps monitor execution:
For detailed information, see the Execution Tracing documentation.
βοΈ Configuration
π οΈ Execution Modes
DroidAgent supports two execution modes:
Direct Execution (reasoning=False)
- Treats goal as a single task
- Directly executes using CodeActAgent
- Suitable for simple, straightforward tasks
Planning Mode (reasoning=True)
- PlannerAgent creates step-by-step plan
- Handles complex, multi-step tasks
- Adaptively updates plan based on results
π Execution Results
The agent returns detailed execution results:
π‘ Best Practices
-
Use Planning for Complex Tasks
- Enable reasoning for multi-step operations
- Direct mode is faster for simple tasks
-
Enable Vision When Needed
- Use for UI-heavy interactions
- Provides better screen understanding
-
Set Appropriate Timeouts
- Adjust based on task complexity
- Consider device performance
-
Handle Errors Properly
- Configure max_retries for robustness
- Check task_history for debugging
-
Memory Usage
- Use tools.remember() for important information
- Agent preserves context between planning iterations