Skip to main content
Base class defining the interface for all device drivers.

DeviceDriver

class DeviceDriver
Base class for all device drivers. Every method raises NotImplementedError by default. Concrete drivers override the methods they support and declare them in the supported class-level set. This allows capability checking at runtime without introspection.

Quick Reference

Driver Methods:
  • connect(), ensure_connected(), tap(), swipe(), input_text(), press_key(), drag(), start_app(), install_app(), get_apps(), list_packages(), screenshot(), get_ui_tree(), get_date()
Key Attribute:
  • supported: set[str] - Set of method names the driver implements. Check membership before calling.

Architecture

The tools architecture follows a multi-layer pattern:
  1. DeviceDriver (tools/driver/base.py): Base class for raw device I/O. Methods raise NotImplementedError by default.
  2. Driver Implementations: Platform-specific drivers
    • AndroidDriver (tools/driver/android.py): Android devices via ADB + Portal app
    • IOSDriver (tools/driver/ios.py): iOS devices via HTTP REST API to Portal app
    • StealthDriver (tools/driver/stealth.py): Wraps another driver, adds human-like timing jitter
    • RecordingDriver (tools/driver/recording.py): Wraps another driver with trajectory recording
    • CloudDriver (tools/driver/cloud.py): Cloud-hosted device driver
  3. StateProvider (tools/ui/provider.py): Fetches raw data from a driver, applies filters/formatters, produces UIState
  4. UIState (tools/ui/state.py): Parsed UI elements with element resolution (get_element(), get_element_coords(), get_element_info(), get_clear_point(), convert_point())
  5. ToolRegistry (agent/tool_registry.py): Central registry of all agent-callable tools
  6. ActionContext (agent/action_context.py): Dependency bag passed as ctx kwarg to action functions
  7. ActionResult (agent/action_result.py): Structured return type (success: bool, summary: str)
Key Components:
  • DeviceDriver: Raw I/O layer, no element indexing, no event emission
  • StateProvider: Orchestrates fetching and parsing device state into UIState
  • UIState: Element lookup by index, coordinate conversion, formatted text output
  • ActionContext: Bundles driver, ui, shared_state, state_provider for action functions
  • ToolRegistry: Registers action functions and custom tools for agent use
This design ensures:
  • Clean separation between device I/O, UI state management, and agent logic
  • Easy addition of new device types by implementing a new driver
  • Capability detection via the supported set
  • Structured results via ActionResult

Common Interface

All DeviceDriver implementations may provide these methods (check supported set for availability):

Lifecycle

  • connect() -> None - Establish connection to the device
  • ensure_connected() -> None - Connect if not already connected

Input Actions

  • tap(x: int, y: int) -> None - Tap at absolute pixel coordinates
  • swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: float = 1000) -> None - Swipe gesture
  • drag(x1: int, y1: int, x2: int, y2: int, duration: float = 3.0) -> None - Drag gesture
  • input_text(text: str, clear: bool = False) -> bool - Text input into focused field
  • press_key(keycode: int) -> None - Key press

App Management

  • start_app(package: str, activity: str | None = None) -> str - Launch app
  • install_app(path: str, **kwargs) -> str - Install app
  • list_packages(include_system: bool = False) -> List[str] - List packages
  • get_apps(include_system: bool = True) -> List[Dict[str, str]] - Get apps with labels

State / Observation

  • screenshot(hide_overlay: bool = True) -> bytes - Capture screen as PNG bytes
  • get_ui_tree() -> Dict[str, Any] - Get raw UI / accessibility tree
  • get_date() -> str - Get device date/time

StateProvider

class StateProvider:
    def __init__(self, driver: DeviceDriver): ...
    async def get_state(self) -> UIState: ...
Base class for state providers. Subclass to support different platforms. Declares a supported set for capability checking (e.g. {"element_index", "convert_point"}).

AndroidStateProvider

class AndroidStateProvider(StateProvider)
Fetches state from an Android device via driver.get_ui_tree(). Includes retry logic (3 attempts). Applies tree filters and formatters to produce a UIState snapshot. Constructor accepts stealth: bool to select StealthUIState (randomized tap coordinates within element bounds) vs regular UIState.

UIState

class UIState
Holds parsed UI elements for a single device state snapshot. Key Methods:
  • get_element(index: int) -> Dict | None - Recursively find an element by its index
  • get_element_coords(index: int) -> Tuple[int, int] - Return the centre (x, y) of an element. Raises ValueError when element is missing or has no bounds.
  • get_element_info(index: int) -> Dict - Return element metadata (text, className, type, child_texts)
  • get_clear_point(index: int) -> Tuple[int, int] - Find a tap point that avoids overlapping elements (falls back to centre)
  • convert_point(x: int, y: int) -> Tuple[int, int] - Convert point to absolute pixels if normalized mode is active
Key Attributes:
  • elements - List of parsed UI elements
  • formatted_text - Formatted text representation of the UI tree
  • focused_text - Text of the currently focused element
  • phone_state - Dict with current activity, keyboard visibility, etc.
  • screen_width / screen_height - Device screen dimensions
  • use_normalized - Whether normalized coordinate mode is active

ActionContext

class ActionContext
Everything an action function needs to interact with the device. Attributes:
  • driver - DeviceDriver instance for raw device I/O
  • ui - UIState instance for element resolution (refreshed each step)
  • shared_state - DroidAgentState for shared agent state
  • state_provider - StateProvider for fetching fresh UI state
  • app_opener_llm - LLM instance for app opening workflow (optional)
  • credential_manager - CredentialManager instance (optional)
  • streaming - Whether streaming is enabled

ActionResult

@dataclass
class ActionResult:
    success: bool
    summary: str
Structured return type from action functions. The summary field is what the agent sees.

Action Functions

Action functions live in agent/utils/actions.py and follow this pattern:
async def click(index: int, *, ctx: ActionContext) -> ActionResult:
    """Click the element with the given index."""
    x, y = ctx.ui.get_element_coords(index)
    await ctx.driver.tap(x, y)
    return ActionResult(success=True, summary=f"Clicked on element at ({x}, {y})")
Available actions:
  • click(index) - Click UI element by index
  • click_at(x, y) - Click at screen coordinates
  • click_area(x1, y1, x2, y2) - Click center of area defined by coordinates
  • long_press(index) - Long press UI element by index
  • long_press_at(x, y) - Long press at screen coordinates
  • type(text, index, clear=False) - Input text into element (set clear=True to clear field first)
  • type_secret(secret_id, index) - Input a credential secret into element by index
  • swipe(coordinate, coordinate2, duration=1.0) - Swipe gesture between two coordinate lists
  • system_button(button) - Press system buttons (back, home, enter)
  • open_app(text) - Open app by name or description
  • wait(duration=1.0) - Wait for a duration in seconds
  • remember(information) - Store info in agent memory
  • complete(success, message) - Mark task as finished

ToolRegistry

class ToolRegistry
Central registry of all agent-callable tools. Methods:
  • register(name, fn, params, description, deps=None) - Register a single tool with optional capability dependencies
  • register_from_dict(tools_dict) - Register tools from {"name": {"parameters": {...}, "description": "...", "function": callable, "deps": set}} format
  • disable(tool_names) - Remove tools by name (silently ignores unknown names)
  • disable_unsupported(capabilities) - Remove tools whose deps are not satisfied by the given capabilities set
  • execute(name, args, ctx, workflow_ctx=None) - Dispatch action by name, returns ActionResult
  • get_tool_descriptions_xml(exclude=None) - Build XML <functions> block for FastAgent
  • get_tool_descriptions_text(exclude=None) - Build text descriptions for executor prompts
  • get_param_types(exclude=None) - Build flat {param_name: type_string} map for XML coercion
  • get_signatures(exclude=None) - Return {name: {parameters, description}} dict for prompt building

Custom Tool Integration

Adding Custom Tools

def my_custom_tool(param: str, **kwargs) -> str:
    """Custom tool description."""
    return f"Result: {param}"

custom_tools = {
    "my_custom_tool": {
        "parameters": {
            "param": {"type": "string", "required": True},
        },
        "description": "Custom tool description with usage example",
        "function": my_custom_tool
    }
}

agent = DroidAgent(
    goal="Do something",
    config=config,
    custom_tools=custom_tools
)

Platform Comparison

FeatureAndroidDriverIOSDriver
ConnectionADB + Portal (USB/TCP)HTTP (Portal app)
tapAbsolute coordinatesAbsolute coordinates
swipeCoordinate-basedDirection-based
dragDeclared but not yet implementedNot supported
input_textWith clear supportNo clear support
press_keyFull Android keycodesHOME only (BACK/ENTER unsupported)
screenshotPNG via PortalPNG via HTTP
get_ui_treeAccessibility tree + phone stateAccessibility tree + phone state
get_dateVia ADB shellNot available (returns empty)
get_appsFull packages with labelsBundle identifiers only

Best Practices

1. Check supported methods before calling

if "get_date" in driver.supported:
    date = await driver.get_date()
else:
    date = "Unknown"

2. Use ActionContext for agent-level interactions

# Action functions use ctx for all device interaction
async def my_action(param: str, *, ctx: ActionContext) -> ActionResult:
    x, y = ctx.ui.get_element_coords(5)
    await ctx.driver.tap(x, y)
    return ActionResult(success=True, summary="Done")

3. Use StateProvider for UI state

from droidrun.tools import AndroidStateProvider, AndroidDriver

# StateProvider handles fetching + parsing + retries
provider = AndroidStateProvider(driver, tree_filter=my_filter, tree_formatter=my_formatter)
ui_state = await provider.get_state()

# UIState provides element lookup
element = ui_state.get_element(5)
x, y = ui_state.get_element_coords(5)

Error Handling

Driver methods use consistent error handling: Raises NotImplementedError:
# Methods not in `supported` set raise NotImplementedError
try:
    await driver.drag(100, 500, 100, 100)
except NotImplementedError:
    print("Drag not supported on this driver")
ActionResult for action functions:
result = await click(5, ctx=ctx)
if not result.success:
    print(f"Action failed: {result.summary}")

See Also