DeviceDriver Base Class

Base class defining the interface for all device drivers.

DeviceDriver

class DeviceDriver

Base class for all device drivers. Every method raises NotImplementedError by default. Concrete drivers override the methods they support and declare them in the supported class-level set. This allows capability checking at runtime without introspection.

Quick Reference

Driver Methods:

connect(), ensure_connected(), tap(), swipe(), input_text(), press_key(), drag(), start_app(), install_app(), get_apps(), list_packages(), screenshot(), get_ui_tree(), get_date()

Key Attribute:

supported: set[str] - Set of method names the driver implements. Check membership before calling.

Architecture

The tools architecture follows a multi-layer pattern:

DeviceDriver (tools/driver/base.py): Base class for raw device I/O. Methods raise NotImplementedError by default.
Driver Implementations: Platform-specific drivers
- AndroidDriver (tools/driver/android.py): Android devices via ADB + Portal app
- IOSDriver (tools/driver/ios.py): iOS devices via HTTP REST API to Portal app
- StealthDriver (tools/driver/stealth.py): Stealth mode driver
- RecordingDriver (tools/driver/recording.py): Wraps another driver with trajectory recording
- CloudDriver (tools/driver/cloud.py): Cloud-hosted device driver
StateProvider (tools/ui/provider.py): Fetches raw data from a driver, applies filters/formatters, produces UIState
UIState (tools/ui/state.py): Parsed UI elements with element resolution (get_element(), get_element_coords(), get_element_info())
ToolRegistry (agent/tool_registry.py): Central registry of all agent-callable tools
ActionContext (agent/action_context.py): Dependency bag passed as ctx kwarg to action functions
ActionResult (agent/action_result.py): Structured return type (success: bool, summary: str)

Key Components:

DeviceDriver: Raw I/O layer, no element indexing, no event emission
StateProvider: Orchestrates fetching and parsing device state into UIState
UIState: Element lookup by index, coordinate conversion, formatted text output
ActionContext: Bundles driver, ui, shared_state, state_provider for action functions
ToolRegistry: Registers action functions and custom tools for agent use

This design ensures:

Clean separation between device I/O, UI state management, and agent logic
Easy addition of new device types by implementing a new driver
Capability detection via the supported set
Structured results via ActionResult

Common Interface

All DeviceDriver implementations may provide these methods (check supported set for availability):

Lifecycle

connect() -> None - Establish connection to the device
ensure_connected() -> None - Connect if not already connected

Input Actions

tap(x: int, y: int) -> None - Tap at absolute pixel coordinates
swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: float = 1000) -> None - Swipe gesture
drag(x1: int, y1: int, x2: int, y2: int, duration: float = 3.0) -> None - Drag gesture
input_text(text: str, clear: bool = False) -> bool - Text input into focused field
press_key(keycode: int) -> None - Key press

App Management

start_app(package: str, activity: str | None = None) -> str - Launch app
install_app(path: str, **kwargs) -> str - Install app
list_packages(include_system: bool = False) -> List[str] - List packages
get_apps(include_system: bool = True) -> List[Dict[str, str]] - Get apps with labels

State / Observation

screenshot(hide_overlay: bool = True) -> bytes - Capture screen as PNG bytes
get_ui_tree() -> Dict[str, Any] - Get raw UI / accessibility tree
get_date() -> str - Get device date/time

StateProvider

class StateProvider:
    def __init__(self, driver: DeviceDriver): ...
    async def get_state(self) -> UIState: ...

Base class for state providers. Subclass to support different platforms.

AndroidStateProvider

class AndroidStateProvider(StateProvider)

Fetches state from an Android device via driver.get_ui_tree(). Includes retry logic (3 attempts). Applies tree filters and formatters to produce a UIState snapshot.

UIState

class UIState

Holds parsed UI elements for a single device state snapshot. Key Methods:

get_element(index: int) -> Dict | None - Recursively find an element by its index
get_element_coords(index: int) -> Tuple[int, int] - Return the centre (x, y) of an element. Raises ValueError when element is missing or has no bounds.
get_element_info(index: int) -> Dict - Return element metadata (text, className, type, child_texts)

Key Attributes:

elements - List of parsed UI elements
formatted_text - Formatted text representation of the UI tree
focused_text - Text of the currently focused element
phone_state - Dict with current activity, keyboard visibility, etc.
screen_width / screen_height - Device screen dimensions

ActionContext

class ActionContext

Everything an action function needs to interact with the device. Attributes:

driver - DeviceDriver instance for raw device I/O
ui - UIState instance for element resolution (refreshed each step)
shared_state - DroidAgentState for shared agent state
state_provider - StateProvider for fetching fresh UI state
app_opener_llm - LLM instance for app opening workflow (optional)
credential_manager - CredentialManager instance (optional)
streaming - Whether streaming is enabled

ActionResult

@dataclass
class ActionResult:
    success: bool
    summary: str

Structured return type from action functions. The summary field is what the agent sees.

Action Functions

Action functions live in agent/utils/actions.py and follow this pattern:

async def click(index: int, *, ctx: ActionContext) -> ActionResult:
    """Click the element with the given index."""
    x, y = ctx.ui.get_element_coords(index)
    await ctx.driver.tap(x, y)
    return ActionResult(success=True, summary=f"Clicked on element at ({x}, {y})")

Available actions:

click(index) - Click UI element by index
click_at(x, y) - Click at screen coordinates
click_area(area) - Click a named screen area
long_press(index) - Long press UI element by index
long_press_at(x, y) - Long press at screen coordinates
type(text, index) - Input text into element
type_secret(secret_id) - Input a credential secret
swipe(coordinate, coordinate2) - Swipe gesture
system_button(button) - Press system buttons (back, home, enter)
open_app(text) - Open app by name
wait(seconds) - Wait for a duration
remember(information) - Store info in agent memory
complete(success, reason) - Mark task as finished
get_state() - Get accessibility tree + phone state
take_screenshot() - Capture device screen

ToolRegistry

class ToolRegistry

Central registry of all agent-callable tools. Methods:

register(name, fn, params, description) - Register a single tool
register_from_dict(tools_dict) - Register tools from {"name": {"parameters": {...}, "description": "...", "function": callable}} format
disable(tool_names) - Remove tools by name

Custom Tool Integration

Adding Custom Tools

def my_custom_tool(param: str, **kwargs) -> str:
    """Custom tool description."""
    return f"Result: {param}"

custom_tools = {
    "my_custom_tool": {
        "parameters": {
            "param": {"type": "string", "required": True},
        },
        "description": "Custom tool description with usage example",
        "function": my_custom_tool
    }
}

agent = DroidAgent(
    goal="Do something",
    config=config,
    custom_tools=custom_tools
)

Platform Comparison

Feature	AndroidDriver	IOSDriver
Connection	ADB + Portal (USB/TCP)	HTTP (Portal app)
tap	Absolute coordinates	Absolute coordinates
swipe	Coordinate-based	Direction-based
drag	Declared but not yet implemented	Not supported
input_text	With clear support	No clear support
press_key	Full Android keycodes	HOME only (BACK/ENTER unsupported)
screenshot	PNG via Portal	PNG via HTTP
get_ui_tree	Accessibility tree + phone state	Accessibility tree + phone state
get_date	Via ADB shell	Not available (returns empty)
get_apps	Full packages with labels	Bundle identifiers only

Best Practices

1. Check supported methods before calling

if "get_date" in driver.supported:
    date = await driver.get_date()
else:
    date = "Unknown"

2. Use ActionContext for agent-level interactions

# Action functions use ctx for all device interaction
async def my_action(param: str, *, ctx: ActionContext) -> ActionResult:
    x, y = ctx.ui.get_element_coords(5)
    await ctx.driver.tap(x, y)
    return ActionResult(success=True, summary="Done")

3. Use StateProvider for UI state

from droidrun.tools import AndroidStateProvider, AndroidDriver

# StateProvider handles fetching + parsing + retries
provider = AndroidStateProvider(driver, tree_filter=my_filter, tree_formatter=my_formatter)
ui_state = await provider.get_state()

# UIState provides element lookup
element = ui_state.get_element(5)
x, y = ui_state.get_element_coords(5)

Error Handling

Driver methods use consistent error handling: Raises NotImplementedError:

# Methods not in `supported` set raise NotImplementedError
try:
    await driver.drag(100, 500, 100, 100)
except NotImplementedError:
    print("Drag not supported on this driver")

ActionResult for action functions:

result = await click(5, ctx=ctx)
if not result.success:
    print(f"Action failed: {result.summary}")

Introduction

Features

Guides

Concepts

SDK Reference

DeviceDriver Base Class

DeviceDriver

Quick Reference

Architecture

Common Interface

Lifecycle

Input Actions

App Management

State / Observation

StateProvider

AndroidStateProvider

UIState

ActionContext

ActionResult

Action Functions

ToolRegistry

Custom Tool Integration

Adding Custom Tools

Platform Comparison

Best Practices

1. Check supported methods before calling

2. Use ActionContext for agent-level interactions

3. Use StateProvider for UI state

Error Handling

See Also

Introduction

Features

Guides

Concepts

SDK Reference

​DeviceDriver

​Quick Reference

​Architecture

​Common Interface

​Lifecycle

​Input Actions

​App Management

​State / Observation

​StateProvider

​AndroidStateProvider

​UIState

​ActionContext

​ActionResult

​Action Functions

​ToolRegistry

​Custom Tool Integration

​Adding Custom Tools

​Platform Comparison

​Best Practices

​1. Check supported methods before calling

​2. Use ActionContext for agent-level interactions

​3. Use StateProvider for UI state

​Error Handling

​See Also

DeviceDriver

Quick Reference

Architecture

Common Interface

Lifecycle

Input Actions

App Management

State / Observation

StateProvider

AndroidStateProvider

UIState

ActionContext

ActionResult

Action Functions

ToolRegistry

Custom Tool Integration

Adding Custom Tools

Platform Comparison

Best Practices

1. Check supported methods before calling

2. Use ActionContext for agent-level interactions

3. Use StateProvider for UI state

Error Handling

See Also