Skip to main content
Base class defining the interface for all device drivers.

DeviceDriver

class DeviceDriver
Base class for all device drivers. Every method raises NotImplementedError by default. Concrete drivers override the methods they support and declare them in the supported class-level set. This allows capability checking at runtime without introspection.

Quick Reference

Driver Methods:
  • connect(), ensure_connected(), tap(), swipe(), input_text(), press_key(), drag(), start_app(), install_app(), get_apps(), list_packages(), screenshot(), get_ui_tree(), get_date()
Key Attribute:
  • supported: set[str] - Set of method names the driver implements. Check membership before calling.

Architecture

The tools architecture follows a multi-layer pattern:
  1. DeviceDriver (tools/driver/base.py): Base class for raw device I/O. Methods raise NotImplementedError by default.
  2. Driver Implementations: Platform-specific drivers
    • AndroidDriver (tools/driver/android.py): Android devices via ADB + Portal app
    • IOSDriver (tools/driver/ios.py): iOS devices via HTTP REST API to Portal app
    • StealthDriver (tools/driver/stealth.py): Stealth mode driver
    • RecordingDriver (tools/driver/recording.py): Wraps another driver with trajectory recording
    • CloudDriver (tools/driver/cloud.py): Cloud-hosted device driver
  3. StateProvider (tools/ui/provider.py): Fetches raw data from a driver, applies filters/formatters, produces UIState
  4. UIState (tools/ui/state.py): Parsed UI elements with element resolution (get_element(), get_element_coords(), get_element_info())
  5. ToolRegistry (agent/tool_registry.py): Central registry of all agent-callable tools
  6. ActionContext (agent/action_context.py): Dependency bag passed as ctx kwarg to action functions
  7. ActionResult (agent/action_result.py): Structured return type (success: bool, summary: str)
Key Components:
  • DeviceDriver: Raw I/O layer, no element indexing, no event emission
  • StateProvider: Orchestrates fetching and parsing device state into UIState
  • UIState: Element lookup by index, coordinate conversion, formatted text output
  • ActionContext: Bundles driver, ui, shared_state, state_provider for action functions
  • ToolRegistry: Registers action functions and custom tools for agent use
This design ensures:
  • Clean separation between device I/O, UI state management, and agent logic
  • Easy addition of new device types by implementing a new driver
  • Capability detection via the supported set
  • Structured results via ActionResult

Common Interface

All DeviceDriver implementations may provide these methods (check supported set for availability):

Lifecycle

  • connect() -> None - Establish connection to the device
  • ensure_connected() -> None - Connect if not already connected

Input Actions

  • tap(x: int, y: int) -> None - Tap at absolute pixel coordinates
  • swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: float = 1000) -> None - Swipe gesture
  • drag(x1: int, y1: int, x2: int, y2: int, duration: float = 3.0) -> None - Drag gesture
  • input_text(text: str, clear: bool = False) -> bool - Text input into focused field
  • press_key(keycode: int) -> None - Key press

App Management

  • start_app(package: str, activity: str | None = None) -> str - Launch app
  • install_app(path: str, **kwargs) -> str - Install app
  • list_packages(include_system: bool = False) -> List[str] - List packages
  • get_apps(include_system: bool = True) -> List[Dict[str, str]] - Get apps with labels

State / Observation

  • screenshot(hide_overlay: bool = True) -> bytes - Capture screen as PNG bytes
  • get_ui_tree() -> Dict[str, Any] - Get raw UI / accessibility tree
  • get_date() -> str - Get device date/time

StateProvider

class StateProvider:
    def __init__(self, driver: DeviceDriver): ...
    async def get_state(self) -> UIState: ...
Base class for state providers. Subclass to support different platforms.

AndroidStateProvider

class AndroidStateProvider(StateProvider)
Fetches state from an Android device via driver.get_ui_tree(). Includes retry logic (3 attempts). Applies tree filters and formatters to produce a UIState snapshot.

UIState

class UIState
Holds parsed UI elements for a single device state snapshot. Key Methods:
  • get_element(index: int) -> Dict | None - Recursively find an element by its index
  • get_element_coords(index: int) -> Tuple[int, int] - Return the centre (x, y) of an element. Raises ValueError when element is missing or has no bounds.
  • get_element_info(index: int) -> Dict - Return element metadata (text, className, type, child_texts)
Key Attributes:
  • elements - List of parsed UI elements
  • formatted_text - Formatted text representation of the UI tree
  • focused_text - Text of the currently focused element
  • phone_state - Dict with current activity, keyboard visibility, etc.
  • screen_width / screen_height - Device screen dimensions

ActionContext

class ActionContext
Everything an action function needs to interact with the device. Attributes:
  • driver - DeviceDriver instance for raw device I/O
  • ui - UIState instance for element resolution (refreshed each step)
  • shared_state - DroidAgentState for shared agent state
  • state_provider - StateProvider for fetching fresh UI state
  • app_opener_llm - LLM instance for app opening workflow (optional)
  • credential_manager - CredentialManager instance (optional)
  • streaming - Whether streaming is enabled

ActionResult

@dataclass
class ActionResult:
    success: bool
    summary: str
Structured return type from action functions. The summary field is what the agent sees.

Action Functions

Action functions live in agent/utils/actions.py and follow this pattern:
async def click(index: int, *, ctx: ActionContext) -> ActionResult:
    """Click the element with the given index."""
    x, y = ctx.ui.get_element_coords(index)
    await ctx.driver.tap(x, y)
    return ActionResult(success=True, summary=f"Clicked on element at ({x}, {y})")
Available actions:
  • click(index) - Click UI element by index
  • click_at(x, y) - Click at screen coordinates
  • click_area(area) - Click a named screen area
  • long_press(index) - Long press UI element by index
  • long_press_at(x, y) - Long press at screen coordinates
  • type(text, index) - Input text into element
  • type_secret(secret_id) - Input a credential secret
  • swipe(coordinate, coordinate2) - Swipe gesture
  • system_button(button) - Press system buttons (back, home, enter)
  • open_app(text) - Open app by name
  • wait(seconds) - Wait for a duration
  • remember(information) - Store info in agent memory
  • complete(success, reason) - Mark task as finished
  • get_state() - Get accessibility tree + phone state
  • take_screenshot() - Capture device screen

ToolRegistry

class ToolRegistry
Central registry of all agent-callable tools. Methods:
  • register(name, fn, params, description) - Register a single tool
  • register_from_dict(tools_dict) - Register tools from {"name": {"parameters": {...}, "description": "...", "function": callable}} format
  • disable(tool_names) - Remove tools by name

Custom Tool Integration

Adding Custom Tools

def my_custom_tool(param: str, **kwargs) -> str:
    """Custom tool description."""
    return f"Result: {param}"

custom_tools = {
    "my_custom_tool": {
        "parameters": {
            "param": {"type": "string", "required": True},
        },
        "description": "Custom tool description with usage example",
        "function": my_custom_tool
    }
}

agent = DroidAgent(
    goal="Do something",
    config=config,
    custom_tools=custom_tools
)

Platform Comparison

FeatureAndroidDriverIOSDriver
ConnectionADB + Portal (USB/TCP)HTTP (Portal app)
tapAbsolute coordinatesAbsolute coordinates
swipeCoordinate-basedDirection-based
dragDeclared but not yet implementedNot supported
input_textWith clear supportNo clear support
press_keyFull Android keycodesHOME only (BACK/ENTER unsupported)
screenshotPNG via PortalPNG via HTTP
get_ui_treeAccessibility tree + phone stateAccessibility tree + phone state
get_dateVia ADB shellNot available (returns empty)
get_appsFull packages with labelsBundle identifiers only

Best Practices

1. Check supported methods before calling

if "get_date" in driver.supported:
    date = await driver.get_date()
else:
    date = "Unknown"

2. Use ActionContext for agent-level interactions

# Action functions use ctx for all device interaction
async def my_action(param: str, *, ctx: ActionContext) -> ActionResult:
    x, y = ctx.ui.get_element_coords(5)
    await ctx.driver.tap(x, y)
    return ActionResult(success=True, summary="Done")

3. Use StateProvider for UI state

from droidrun.tools import AndroidStateProvider, AndroidDriver

# StateProvider handles fetching + parsing + retries
provider = AndroidStateProvider(driver, tree_filter=my_filter, tree_formatter=my_formatter)
ui_state = await provider.get_state()

# UIState provides element lookup
element = ui_state.get_element(5)
x, y = ui_state.get_element_coords(5)

Error Handling

Driver methods use consistent error handling: Raises NotImplementedError:
# Methods not in `supported` set raise NotImplementedError
try:
    await driver.drag(100, 500, 100, 100)
except NotImplementedError:
    print("Drag not supported on this driver")
ActionResult for action functions:
result = await click(5, ctx=ctx)
if not result.success:
    print(f"Action failed: {result.summary}")

See Also