MobileAgent - A wrapper class that coordinates the planning and execution of tasks to achieve a user’s goal on an Android or iOS device.
MobileAgent
class MobileAgent(Workflow)
A wrapper class that coordinates between agents to achieve a user’s goal.
Architecture:
- When
reasoning=False: Uses FastAgent directly for immediate execution
- When
reasoning=True: Uses ManagerAgent (planning) + ExecutorAgent (actions)
MobileAgent.__init__
def __init__(
goal: str,
config: MobileConfig | None = None,
llms: dict[str, LLM] | LLM | None = None,
custom_tools: dict = None,
credentials: Union[dict, CredentialManager, None] = None,
variables: dict | None = None,
output_model: Type[BaseModel] | None = None,
prompts: dict[str, str] | None = None,
driver: "DeviceDriver | None" = None,
state_provider: "StateProvider | None" = None,
timeout: int = 1000,
)
Initialize the MobileAgent wrapper.
Arguments:
goal str - User’s goal or command to execute
config MobileConfig | None - Full configuration object (required if llms not provided). Contains agent settings, LLM profiles, device config, and more.
llms dict[str, LLM] | LLM | None - Optional LLM configuration:
dict[str, LLM]: Agent-specific LLMs with keys: “manager”, “executor”, “fast_agent”, “app_opener”, “structured_output”
LLM: Single LLM instance used for all agents
None: LLMs will be loaded from config.llm_profiles
custom_tools dict - Custom tool definitions. Format: {"tool_name": {"parameters": {...}, "description": "...", "function": callable}}. These are merged with auto-generated credential tools.
credentials Union[dict, CredentialManager, None] - Direct credential mapping {"SECRET_ID": "value"}, a CredentialManager instance, or None. If None, credentials will be loaded from config.credentials if available.
variables dict | None - Custom variables accessible throughout execution. Available in shared_state.custom_variables.
output_model Type[BaseModel] | None - Pydantic model for structured output extraction from final answer. If provided, the final answer will be parsed into this model.
prompts dict[str, str] | None - Custom Jinja2 prompt templates to override defaults. Keys: “fast_agent_system”, “fast_agent_user”, “manager_system”, “executor_system”. Values: Jinja2 template strings (NOT file paths).
driver DeviceDriver | None - Pre-configured device driver instance (AndroidDriver or IOSDriver). If None, a driver will be created from config.
state_provider StateProvider | None - Pre-configured state provider instance. If None, a state provider will be created from config.
timeout int - Workflow timeout in seconds (default: 1000)
Basic initialization pattern (recommended):
from mobilerun import MobileAgent
from mobilerun.config_manager import MobileConfig
# Initialize with default config
config = MobileConfig()
# Create agent (LLMs loaded from config.llm_profiles)
agent = MobileAgent(
goal="Open Chrome and search for Mobilerun",
config=config
)
# Run agent
result = await agent.run()
Loading from YAML (optional):
from mobilerun import MobileAgent
from mobilerun.config_manager import MobileConfig
# Load config from config.yaml
config = MobileConfig.from_yaml("config.yaml")
# Create agent (LLMs loaded from config.llm_profiles)
agent = MobileAgent(
goal="Open Chrome and search for Mobilerun",
config=config
)
# Run agent
result = await agent.run()
Custom LLM dictionary pattern:
from mobilerun import MobileAgent
from mobilerun.config_manager import MobileConfig
from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic
# Initialize config
config = MobileConfig()
# Create custom LLMs
llms = {
"manager": Anthropic(model="claude-sonnet-4-5-latest", temperature=0.2),
"executor": Anthropic(model="claude-sonnet-4-5-latest", temperature=0.1),
"fast_agent": OpenAI(model="gpt-4o", temperature=0.2),
"app_opener": OpenAI(model="gpt-4o-mini", temperature=0.0),
"structured_output": OpenAI(model="gpt-4o-mini", temperature=0.0),
}
# Create agent with custom LLMs
agent = MobileAgent(
goal="Send a message to John",
llms=llms,
config=config
)
result = await agent.run()
Single LLM pattern:
from mobilerun import MobileAgent
from mobilerun.config_manager import MobileConfig
from llama_index.llms.openai import OpenAI
# Initialize config
config = MobileConfig()
# Use same LLM for all agents
llm = OpenAI(model="gpt-4o", temperature=0.2)
agent = MobileAgent(
goal="Take a screenshot and save it",
llms=llm,
config=config
)
result = await agent.run()
Custom tools and credentials:
from mobilerun import MobileAgent
from mobilerun.config_manager import MobileConfig
# Initialize config
config = MobileConfig()
# Define custom tool
def search_database(query: str) -> str:
"""Search the local database."""
# Your implementation
return f"Results for: {query}"
custom_tools = {
"search_database": {
"parameters": {
"query": {"type": "string", "required": True},
},
"description": "Search the local database for information",
"function": search_database
}
}
# Provide credentials directly
credentials = {
"GMAIL_USERNAME": "user@gmail.com",
"GMAIL_PASSWORD": "secret123"
}
agent = MobileAgent(
goal="Search database and email results",
config=config,
custom_tools=custom_tools,
credentials=credentials
)
result = await agent.run()
Structured output extraction:
from mobilerun import MobileAgent
from mobilerun.config_manager import MobileConfig
from pydantic import BaseModel, Field
# Initialize config
config = MobileConfig()
# Define output schema
class WeatherInfo(BaseModel):
"""Weather information."""
temperature: float = Field(description="Temperature in Celsius")
condition: str = Field(description="Weather condition")
humidity: int = Field(description="Humidity percentage")
agent = MobileAgent(
goal="Open weather app and get current weather",
config=config,
output_model=WeatherInfo
)
result = await agent.run()
# Access structured output
if result.success and result.structured_output:
weather = result.structured_output # WeatherInfo object
print(f"Temperature: {weather.temperature}°C")
print(f"Condition: {weather.condition}")
MobileAgent.run
async def run(*args, **kwargs) -> ResultEvent
Run the MobileAgent workflow.
Returns:
ResultEvent - Result object with the following attributes:
success (bool): True if task completed successfully
reason (str): Success message or failure reason
steps (int): Number of steps executed
structured_output (Any): Parsed Pydantic model (if output_model provided, otherwise None)
Usage:
from mobilerun import MobileAgent
from mobilerun.config_manager import MobileConfig
# Initialize config
config = MobileConfig()
# Create and run agent
agent = MobileAgent(goal="...", config=config)
result = await agent.run()
print(f"Success: {result.success}")
print(f"Reason: {result.reason}")
print(f"Steps: {result.steps}")
Streaming events:
from mobilerun import MobileAgent
from mobilerun.config_manager import MobileConfig
# Initialize config
config = MobileConfig()
agent = MobileAgent(goal="...", config=config)
# Stream events as they occur
async for event in agent.run_event_stream():
if isinstance(event, ManagerInputEvent):
print("Manager is planning...")
elif isinstance(event, ExecutorInputEvent):
print("Executor is taking action...")
elif isinstance(event, ToolExecutionEvent):
print(f"Tool executed: {event.tool_name} - {event.summary}")
elif isinstance(event, ResultEvent):
# Final result
print(f"Success: {event.success}")
print(f"Reason: {event.reason}")
MobileAgent.send_user_message
def send_user_message(message: str) -> QueuedUserMessage
Inject an external user message into the running workflow. The message is queued and consumed by the active agent (FastAgent or Manager) at its next step.
Arguments:
message str - The message to inject
Returns:
QueuedUserMessage - Object with id, message, and queued_at_step fields
Usage:
import asyncio
agent = MobileAgent(goal="...", config=config)
# Start the agent as a background task so it begins executing
task = asyncio.create_task(agent.run())
# Wait for the agent to be running before injecting messages.
# In practice, call send_user_message from an external trigger
# (e.g., a UI callback or API endpoint) once the agent is active.
await asyncio.sleep(1)
# Inject a message mid-run
queued = agent.send_user_message("Actually, search for 'Python' instead")
result = await task
If the agent has already finished or the message arrives at max_steps, the message will be dropped and an ExternalUserMessageDroppedEvent is emitted. Pending messages also block complete() until they are consumed.
Event Types
MobileAgent emits various events during execution:
Workflow Events:
StartEvent - Workflow started
ManagerInputEvent - Manager planning phase started
ManagerContextEvent - Manager received context for planning
ManagerResponseEvent - Manager intermediate response
ManagerPlanEvent - Manager created a plan
ManagerPlanDetailsEvent - Manager plan details
ExecutorInputEvent - Executor action phase started
ExecutorContextEvent - Executor received context
ExecutorResponseEvent - Executor intermediate response
ExecutorActionEvent - Executor action details
ExecutorActionResultEvent - Executor action result details
ExecutorResultEvent - Executor completed an action
ExternalUserMessageAppliedEvent - External message consumed by agent
ExternalUserMessageDroppedEvent - External message dropped (e.g., at max steps)
FastAgentExecuteEvent - FastAgent started (direct mode)
FastAgentResultEvent - FastAgent completed
FinalizeEvent - Workflow finalizing
StopEvent - Workflow completed
Common Events:
ToolExecutionEvent - Emitted by ToolRegistry after every tool dispatch (contains tool_name, tool_args, success, summary)
ScreenshotEvent - Screenshot captured
RecordUIStateEvent - UI state recorded
Configuration
MobileAgent uses a hierarchical configuration system. See the Configuration Guide for details.
Key configuration options:
agent:
max_steps: 15 # Maximum execution steps
reasoning: false # Enable Manager/Executor workflow
fast_agent:
vision: false # Enable screenshot analysis
manager:
vision: false # Enable screenshot analysis
executor:
vision: false # Enable screenshot analysis
device:
serial: null # Device serial (null = auto-detect)
platform: android # "android" or "ios"
use_tcp: false # TCP vs content provider
logging:
debug: false # Debug logging
save_trajectory: none # Trajectory saving: "none", "step", "action"
tracing:
enabled: false # Arize Phoenix tracing
Advanced Usage
Custom Tools instance:
from mobilerun import MobileAgent, DeviceConfig
from mobilerun.config_manager import MobileConfig
# Initialize config with device settings
device_config = DeviceConfig(serial="emulator-5554", use_tcp=True)
config = MobileConfig(device=device_config)
agent = MobileAgent(
goal="Open settings",
config=config,
)
result = await agent.run()
Custom variables:
from mobilerun import MobileAgent
from mobilerun.config_manager import MobileConfig
# Initialize config
config = MobileConfig()
agent = MobileAgent(
goal="Complete task using context",
config=config,
variables={
"user_name": "Alice",
"project_id": "12345",
"api_endpoint": "https://api.example.com"
}
)
result = await agent.run()
Variables are accessible in shared_state.custom_variables throughout execution and can be referenced in custom tools or scripts.
Custom prompts:
from mobilerun import MobileAgent
from mobilerun.config_manager import MobileConfig
# Initialize config
config = MobileConfig()
# Override default prompts with custom Jinja2 templates
custom_prompts = {
"fast_agent_system": "You are a specialized agent for {{ platform }} devices...",
"manager_system": "You are a planning agent. Your goal: {{ instruction }}..."
}
agent = MobileAgent(
goal="Complete specialized task",
config=config,
prompts=custom_prompts
)
result = await agent.run()
Available prompt keys: “fast_agent_system”, “fast_agent_user”, “manager_system”, “executor_system”
Notes
- Config requirement: Either
config or llms must be provided. If llms is not provided, config is required to load LLMs from profiles.
- Vision mode: Enabling vision (agent_config.*.vision = True) increases token usage as screenshots are sent to the LLM.
- Reasoning mode:
reasoning=True uses Manager/Executor workflow for complex planning. reasoning=False uses FastAgent for direct execution.
- Timeout: Default is 1000 seconds. Increase for long-running tasks.
- Credentials: When credentials are provided, the
type_secret(secret_id, index) tool is automatically registered. The agent never sees the actual secret values, only the secret IDs.