Skip to main content
DroidAgent - A wrapper class that coordinates the planning and execution of tasks to achieve a user’s goal on an Android or iOS device.

DroidAgent

class DroidAgent(Workflow)
A wrapper class that coordinates between agents to achieve a user’s goal. Architecture:
  • When reasoning=False: Uses CodeActAgent directly for immediate execution
  • When reasoning=True: Uses ManagerAgent (planning) + ExecutorAgent (actions) + ScripterAgent (off-device operations)

DroidAgent.__init__

def __init__(
    goal: str,
    config: DroidrunConfig | None = None,
    llms: dict[str, LLM] | LLM | None = None,
    agent_config: AgentConfig | None = None,
    device_config: DeviceConfig | None = None,
    tools: "Tools | ToolsConfig | None" = None,
    logging_config: LoggingConfig | None = None,
    tracing_config: TracingConfig | None = None,
    telemetry_config: TelemetryConfig | None = None,
    custom_tools: dict | None = None,
    credentials: "CredentialsConfig | dict | None" = None,
    variables: dict | None = None,
    output_model: Type[BaseModel] | None = None,
    prompts: dict[str, str] | None = None,
    timeout: int = 1000
)
Initialize the DroidAgent wrapper. Arguments:
  • goal str - User’s goal or command to execute
  • config DroidrunConfig | None - Full configuration object (required if llms not provided). Contains agent settings, LLM profiles, device config, and more. If provided, individual config overrides (agent_config, device_config, etc.) take precedence.
  • llms dict[str, LLM] | LLM | None - Optional LLM configuration:
    • dict[str, LLM]: Agent-specific LLMs with keys: “manager”, “executor”, “codeact”, “text_manipulator”, “app_opener”, “scripter”, “structured_output”
    • LLM: Single LLM instance used for all agents
    • None: LLMs will be loaded from config.llm_profiles
  • agent_config AgentConfig | None - Agent configuration override (max_steps, reasoning mode, vision settings, prompts). Overrides config.agent if provided.
  • device_config DeviceConfig | None - Device configuration override (serial, platform, use_tcp). Overrides config.device if provided.
  • tools Tools | ToolsConfig | None - Tools configuration:
    • Tools: Pre-configured Tools instance (AdbTools or IOSTools)
    • ToolsConfig: Configuration for creating tools (e.g., allow_drag)
    • None: Tools will be created from config.tools and device_config
  • logging_config LoggingConfig | None - Logging configuration override (debug, save_trajectory). Overrides config.logging if provided.
  • tracing_config TracingConfig | None - Tracing configuration override (enabled). Overrides config.tracing if provided.
  • telemetry_config TelemetryConfig | None - Telemetry configuration override (enabled). Overrides config.telemetry if provided.
  • custom_tools dict | None - Custom tool definitions. Format: {"tool_name": {"signature": "...", "description": "...", "function": callable}}. These are merged with auto-generated credential tools.
  • credentials CredentialsConfig | dict | None - Credential configuration:
    • CredentialsConfig: From config.credentials (enabled, file_path)
    • dict: Direct credential mapping {"SECRET_ID": "value"}
    • None: Uses config.credentials if available
  • variables dict | None - Custom variables accessible throughout execution. Available in shared_state.custom_variables.
  • output_model Type[BaseModel] | None - Pydantic model for structured output extraction from final answer. If provided, the final answer will be parsed into this model.
  • prompts dict[str, str] | None - Custom Jinja2 prompt templates to override defaults. Keys: “codeact_system”, “codeact_user”, “manager_system”, “executor_system”, “scripter_system”. Values: Jinja2 template strings (NOT file paths).
  • timeout int - Workflow timeout in seconds (default: 1000)
Basic initialization pattern (recommended):
from droidrun import DroidAgent
from droidrun.config_manager.config_manager import DroidrunConfig

# Initialize with default config
config = DroidrunConfig()

# Create agent (LLMs loaded from config.llm_profiles)
agent = DroidAgent(
    goal="Open Chrome and search for Droidrun",
    config=config
)

# Run agent
result = await agent.run()
Loading from YAML (optional):
from droidrun import DroidAgent
from droidrun.config_manager.config_manager import DroidrunConfig

# Load config from config.yaml
config = DroidrunConfig.from_yaml("config.yaml")

# Create agent (LLMs loaded from config.llm_profiles)
agent = DroidAgent(
    goal="Open Chrome and search for Droidrun",
    config=config
)

# Run agent
result = await agent.run()
Custom LLM dictionary pattern:
from droidrun import DroidAgent
from droidrun.config_manager.config_manager import DroidrunConfig
from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic

# Initialize config
config = DroidrunConfig()

# Create custom LLMs
llms = {
    "manager": Anthropic(model="claude-sonnet-4-5-latest", temperature=0.2),
    "executor": Anthropic(model="claude-sonnet-4-5-latest", temperature=0.1),
    "codeact": OpenAI(model="gpt-4o", temperature=0.2),
    "text_manipulator": OpenAI(model="gpt-4o-mini", temperature=0.3),
    "app_opener": OpenAI(model="gpt-4o-mini", temperature=0.0),
    "scripter": OpenAI(model="gpt-4o", temperature=0.1),
    "structured_output": OpenAI(model="gpt-4o-mini", temperature=0.0),
}

# Create agent with custom LLMs
agent = DroidAgent(
    goal="Send a message to John",
    llms=llms,
    config=config
)

result = await agent.run()
Single LLM pattern:
from droidrun import DroidAgent
from droidrun.config_manager.config_manager import DroidrunConfig
from llama_index.llms.openai import OpenAI

# Initialize config
config = DroidrunConfig()

# Use same LLM for all agents
llm = OpenAI(model="gpt-4o", temperature=0.2)

agent = DroidAgent(
    goal="Take a screenshot and save it",
    llms=llm,
    config=config
)

result = await agent.run()
Custom tools and credentials:
from droidrun import DroidAgent
from droidrun.config_manager.config_manager import DroidrunConfig

# Initialize config
config = DroidrunConfig()

# Define custom tool
def search_database(query: str) -> str:
    """Search the local database."""
    # Your implementation
    return f"Results for: {query}"

custom_tools = {
    "search_database": {
        "signature": "search_database(query: str) -> str",
        "description": "Search the local database for information",
        "function": search_database
    }
}

# Provide credentials directly
credentials = {
    "GMAIL_USERNAME": "user@gmail.com",
    "GMAIL_PASSWORD": "secret123"
}

agent = DroidAgent(
    goal="Search database and email results",
    config=config,
    custom_tools=custom_tools,
    credentials=credentials
)

result = await agent.run()
Structured output extraction:
from droidrun import DroidAgent
from droidrun.config_manager.config_manager import DroidrunConfig
from pydantic import BaseModel, Field

# Initialize config
config = DroidrunConfig()

# Define output schema
class WeatherInfo(BaseModel):
    """Weather information."""
    temperature: float = Field(description="Temperature in Celsius")
    condition: str = Field(description="Weather condition")
    humidity: int = Field(description="Humidity percentage")

agent = DroidAgent(
    goal="Open weather app and get current weather",
    config=config,
    output_model=WeatherInfo
)

result = await agent.run()

# Access structured output
if result.success and result.structured_output:
    weather = result.structured_output  # WeatherInfo object
    print(f"Temperature: {weather.temperature}°C")
    print(f"Condition: {weather.condition}")

DroidAgent.run

async def run(*args, **kwargs) -> ResultEvent
Run the DroidAgent workflow. Returns:
  • ResultEvent - Result object with the following attributes:
    • success (bool): True if task completed successfully
    • reason (str): Success message or failure reason
    • steps (int): Number of steps executed
    • structured_output (Any): Parsed Pydantic model (if output_model provided, otherwise None)
Usage:
from droidrun import DroidAgent
from droidrun.config_manager.config_manager import DroidrunConfig

# Initialize config
config = DroidrunConfig()

# Create and run agent
agent = DroidAgent(goal="...", config=config)
result = await agent.run()

print(f"Success: {result.success}")
print(f"Reason: {result.reason}")
print(f"Steps: {result.steps}")
Streaming events:
from droidrun import DroidAgent
from droidrun.config_manager.config_manager import DroidrunConfig

# Initialize config
config = DroidrunConfig()

agent = DroidAgent(goal="...", config=config)

# Stream events as they occur
async for event in agent.run_event_stream():
    if isinstance(event, ManagerInputEvent):
        print("Manager is planning...")
    elif isinstance(event, ExecutorInputEvent):
        print("Executor is taking action...")
    elif isinstance(event, TapActionEvent):
        print(f"Tapping element at {event.x}, {event.y}")
    elif isinstance(event, ResultEvent):
        # Final result
        print(f"Success: {event.success}")
        print(f"Reason: {event.reason}")

Event Types

DroidAgent emits various events during execution: Workflow Events:
  • StartEvent - Workflow started
  • ManagerInputEvent - Manager planning phase started
  • ManagerContextEvent - Manager received context for planning
  • ManagerResponseEvent - Manager intermediate response
  • ManagerPlanEvent - Manager created a plan
  • ManagerPlanDetailsEvent - Manager plan details
  • ExecutorInputEvent - Executor action phase started
  • ExecutorContextEvent - Executor received context
  • ExecutorResponseEvent - Executor intermediate response
  • ExecutorActionEvent - Executor action details
  • ExecutorActionResultEvent - Executor action result details
  • ExecutorResultEvent - Executor completed an action
  • ScripterExecutorInputEvent - ScripterAgent started
  • ScripterExecutorResultEvent - ScripterAgent completed
  • CodeActExecuteEvent - CodeActAgent started (direct mode)
  • CodeActResultEvent - CodeActAgent completed
  • FinalizeEvent - Workflow finalizing
  • StopEvent - Workflow completed
Action Events:
  • TapActionEvent - UI element tapped
  • SwipeActionEvent - Swipe gesture performed
  • DragActionEvent - Drag gesture performed
  • InputTextActionEvent - Text input
  • KeyPressActionEvent - Key press action
  • StartAppEvent - App launched
State Events:
  • ScreenshotEvent - Screenshot captured
  • RecordUIStateEvent - UI state recorded
  • MacroEvent - Macro action recorded

Configuration

DroidAgent uses a hierarchical configuration system. See the Configuration Guide for details. Key configuration options:
agent:
  max_steps: 15           # Maximum execution steps
  reasoning: false        # Enable Manager/Executor workflow

  codeact:
    vision: false         # Enable screenshot analysis
    safe_execution: false # Restrict code execution

  manager:
    vision: false         # Enable screenshot analysis

  executor:
    vision: false         # Enable screenshot analysis

device:
  serial: null            # Device serial (null = auto-detect)
  platform: android       # "android" or "ios"
  use_tcp: false          # TCP vs content provider

logging:
  debug: false            # Debug logging
  save_trajectory: none   # Trajectory saving: "none", "step", "action"

tracing:
  enabled: false          # Arize Phoenix tracing

Advanced Usage

Custom Tools instance:
from droidrun import DroidAgent, DeviceConfig
from droidrun.config_manager.config_manager import DroidrunConfig

# Initialize config
config = DroidrunConfig()

device_config = DeviceConfig(serial="emulator-5554", use_tcp=True)

agent = DroidAgent(
    goal="Open settings",
    config=config,
    device_config=device_config
)

result = await agent.run()
Custom variables:
from droidrun import DroidAgent
from droidrun.config_manager.config_manager import DroidrunConfig

# Initialize config
config = DroidrunConfig()

agent = DroidAgent(
    goal="Complete task using context",
    config=config,
    variables={
        "user_name": "Alice",
        "project_id": "12345",
        "api_endpoint": "https://api.example.com"
    }
)

result = await agent.run()
Variables are accessible in shared_state.custom_variables throughout execution and can be referenced in custom tools or scripts. Custom prompts:
from droidrun import DroidAgent
from droidrun.config_manager.config_manager import DroidrunConfig

# Initialize config
config = DroidrunConfig()

# Override default prompts with custom Jinja2 templates
custom_prompts = {
    "codeact_system": "You are a specialized agent for {{ platform }} devices...",
    "manager_system": "You are a planning agent. Your goal: {{ instruction }}..."
}

agent = DroidAgent(
    goal="Complete specialized task",
    config=config,
    prompts=custom_prompts
)

result = await agent.run()
Available prompt keys: “codeact_system”, “codeact_user”, “manager_system”, “executor_system”, “scripter_system”

Notes

  • Config requirement: Either config or llms must be provided. If llms is not provided, config is required to load LLMs from profiles.
  • Vision mode: Enabling vision (agent_config.*.vision = True) increases token usage as screenshots are sent to the LLM.
  • Reasoning mode: reasoning=True uses Manager/Executor workflow for complex planning. reasoning=False uses CodeActAgent for direct execution.
  • Safe execution: When enabled, restricts imports and builtins in CodeActAgent and ScripterAgent (see safe_execution config).
  • Timeout: Default is 1000 seconds. Increase for long-running tasks.
  • Credentials: Credentials are automatically injected as custom tools (e.g., get_username(), get_password()).
I