πŸ€– DroidAgent

DroidRun uses a powerful DroidAgent system that combines LLM-based reasoning and execution to control Android devices effectively.

πŸ“š Core Components

The DroidAgent architecture consists of:

  • Planning System: Optional planning capabilities provided by PlannerAgent
  • Execution System: CodeActAgent for executing tasks
  • Tool System: Modular tools for Android device control
  • Vision Integration: Optional screen analysis capabilities

πŸ”„ Execution Flow

1

Goal Setting

The user provides a natural language task like β€œOpen settings and enable dark mode”

2

Planning (With Reasoning)

If reasoning=True, PlannerAgent breaks down the goal into smaller tasks

3

Task Execution

CodeActAgent executes each task using the appropriate tools

4

Result Analysis

The agent analyzes results and determines next steps

5

Error Handling

Failed tasks can be retried or the plan adjusted

πŸ› οΈ Available Tools

The DroidAgent has access to these core tools:

πŸ“Έ Vision System

The vision system enhances the agent’s capabilities:

agent = DroidAgent(
    goal="Open settings and enable dark mode",
    llm=llm,
    tools_instance=tools,
    vision=True  # Enable vision
)

Benefits include:

  • Visual Analysis: Screen content understanding
  • UI Element Detection: Accurate element location
  • Error Verification: Visual confirmation of actions
  • Complex Navigation: Better handling of dynamic UIs

🎯 Planning System

When reasoning is enabled, the agent uses advanced planning:

agent = DroidAgent(
    goal="Configure device settings",
    llm=llm,
    tools_instance=tools,
    reasoning=True  # Enable planning
)

Features:

  • Step Planning: Break down complex tasks
  • Error Recovery: Handle unexpected situations
  • Optimization: Choose efficient approaches
  • Verification: Validate results

πŸ” Tracing Support

The tracing system helps monitor execution:

# Start Phoenix server first
# Run 'phoenix serve' in a separate terminal

agent = DroidAgent(
    goal="Your task",
    llm=llm,
    tools_instance=tools,
    enable_tracing=True  # Enable Phoenix tracing
)

For detailed information, see the Execution Tracing documentation.

βš™οΈ Configuration

from droidrun.agent.droid import DroidAgent
from droidrun.tools import load_tools

# Load tools
tool_list, tools_instance = await load_tools(serial="device_id")

# Create agent
agent = DroidAgent(
    goal="Open Settings and enable dark mode",
    llm=llm,                        # Language model
    tools_instance=tools_instance,  # Tool provider
    tool_list=tool_list,            # Available tools
    vision=True,                    # Enable vision
    reasoning=True,                 # Enable planning
    max_steps=15,                   # Maximum planning steps
    timeout=1000,                   # Overall timeout
    max_retries=3,                  # Retry attempts
    enable_tracing=True,            # Execution tracing
    debug=False                     # Debug mode
)

# Run the agent
result = await agent.run()

πŸ› οΈ Execution Modes

DroidAgent supports two execution modes:

Direct Execution (reasoning=False)

agent = DroidAgent(
    goal="Take a screenshot",
    llm=llm,
    tools_instance=tools_instance,
    tool_list=tool_list,
    reasoning=False  # No planning
)
  • Treats goal as a single task
  • Directly executes using CodeActAgent
  • Suitable for simple, straightforward tasks

Planning Mode (reasoning=True)

agent = DroidAgent(
    goal="Find and install Twitter app",
    llm=llm,
    tools_instance=tools_instance,
    tool_list=tool_list,
    reasoning=True  # With planning
)
  • PlannerAgent creates step-by-step plan
  • Handles complex, multi-step tasks
  • Adaptively updates plan based on results

πŸ“Š Execution Results

The agent returns detailed execution results:

result = await agent.run()

# Check success
if result["success"]:
    print("Goal completed successfully!")
else:
    print(f"Failed: {result['reason']}")

# Access execution details
print(f"Steps executed: {result['steps']}")
print(f"Task history: {result['task_history']}")

πŸ’‘ Best Practices

  1. Use Planning for Complex Tasks

    • Enable reasoning for multi-step operations
    • Direct mode is faster for simple tasks
  2. Enable Vision When Needed

    • Use for UI-heavy interactions
    • Provides better screen understanding
  3. Set Appropriate Timeouts

    • Adjust based on task complexity
    • Consider device performance
  4. Handle Errors Properly

    • Configure max_retries for robustness
    • Check task_history for debugging
  5. Memory Usage

    • Use tools.remember() for important information
    • Agent preserves context between planning iterations