π€ DroidAgent
DroidRun uses a powerful DroidAgent system that combines LLM-based reasoning and execution to control Android devices effectively.π Core Components
The DroidAgent architecture consists of:- Planning System: Optional planning capabilities provided by PlannerAgent
- Execution System: CodeActAgent for executing tasks
- Tool System: Modular tools for Android device control
- Vision Integration: Optional screen analysis capabilities
π Execution Flow
1
Goal Setting
The user provides a natural language task like βOpen settings and enable dark modeβ
2
Planning (With Reasoning)
If reasoning=True, PlannerAgent breaks down the goal into smaller tasks
3
Task Execution
CodeActAgent executes each task using the appropriate tools
4
Result Analysis
The agent analyzes results and determines next steps
5
Error Handling
Failed tasks can be retried or the plan adjusted
π οΈ Available Tools
The DroidAgent has access to these core tools:UI Interaction
UI Interaction
get_clickables()- Get interactive UI elementstap_by_index(index)- Tap element by indextap(index)- Simplified tap by indexswipe(start_x, start_y, end_x, end_y)- Swipe between coordinatesinput_text(text)- Type textpress_key(keycode)- Press system keys (e.g., 4 for BACK)
App Management
App Management
start_app(package)- Launch appslist_packages()- List installed packagesinstall_app(apk_path)- Install APKs
Screen Analysis
Screen Analysis
take_screenshot()- Capture screenextract(filename)- Save UI state to JSONget_all_elements()- Get complete UI hierarchyget_phone_state()- Get current activity and keyboard status
Task Management
Task Management
remember(information)- Store important informationget_memory()- Retrieve stored informationcomplete(success, reason)- Signal task completion
πΈ Vision System
The vision system enhances the agentβs capabilities:- Visual Analysis: Screen content understanding
- UI Element Detection: Accurate element location
- Error Verification: Visual confirmation of actions
- Complex Navigation: Better handling of dynamic UIs
π― Planning System
When reasoning is enabled, the agent uses advanced planning:- Step Planning: Break down complex tasks
- Error Recovery: Handle unexpected situations
- Optimization: Choose efficient approaches
- Verification: Validate results
π Tracing Support
The tracing system helps monitor execution:βοΈ Configuration
π οΈ Execution Modes
DroidAgent supports two execution modes:Direct Execution (reasoning=False)
- Treats goal as a single task
- Directly executes using CodeActAgent
- Suitable for simple, straightforward tasks
Planning Mode (reasoning=True)
- PlannerAgent creates step-by-step plan
- Handles complex, multi-step tasks
- Adaptively updates plan based on results
π Execution Results
The agent returns detailed execution results:π‘ Best Practices
-
Use Planning for Complex Tasks
- Enable reasoning for multi-step operations
- Direct mode is faster for simple tasks
-
Enable Vision When Needed
- Use for UI-heavy interactions
- Provides better screen understanding
-
Set Appropriate Timeouts
- Adjust based on task complexity
- Consider device performance
-
Handle Errors Properly
- Check task_history for debugging
-
Memory Usage
- Use tools.remember() for important information
- Agent preserves context between planning iterations