DeviceDriver
NotImplementedError by default. Concrete drivers override the methods they support and declare them in the supported class-level set. This allows capability checking at runtime without introspection.
Quick Reference
Driver Methods:connect(),ensure_connected(),tap(),swipe(),input_text(),press_key(),drag(),start_app(),install_app(),get_apps(),list_packages(),screenshot(),get_ui_tree(),get_date()
supported:set[str]- Set of method names the driver implements. Check membership before calling.
Architecture
The tools architecture follows a multi-layer pattern:- DeviceDriver (
tools/driver/base.py): Base class for raw device I/O. Methods raiseNotImplementedErrorby default. - Driver Implementations: Platform-specific drivers
AndroidDriver(tools/driver/android.py): Android devices via ADB + Portal appIOSDriver(tools/driver/ios.py): iOS devices via HTTP REST API to Portal appStealthDriver(tools/driver/stealth.py): Stealth mode driverRecordingDriver(tools/driver/recording.py): Wraps another driver with trajectory recordingCloudDriver(tools/driver/cloud.py): Cloud-hosted device driver
- StateProvider (
tools/ui/provider.py): Fetches raw data from a driver, applies filters/formatters, producesUIState - UIState (
tools/ui/state.py): Parsed UI elements with element resolution (get_element(),get_element_coords(),get_element_info()) - ToolRegistry (
agent/tool_registry.py): Central registry of all agent-callable tools - ActionContext (
agent/action_context.py): Dependency bag passed asctxkwarg to action functions - ActionResult (
agent/action_result.py): Structured return type (success: bool,summary: str)
- DeviceDriver: Raw I/O layer, no element indexing, no event emission
- StateProvider: Orchestrates fetching and parsing device state into
UIState - UIState: Element lookup by index, coordinate conversion, formatted text output
- ActionContext: Bundles
driver,ui,shared_state,state_providerfor action functions - ToolRegistry: Registers action functions and custom tools for agent use
- Clean separation between device I/O, UI state management, and agent logic
- Easy addition of new device types by implementing a new driver
- Capability detection via the
supportedset - Structured results via
ActionResult
Common Interface
All DeviceDriver implementations may provide these methods (checksupported set for availability):
Lifecycle
connect() -> None- Establish connection to the deviceensure_connected() -> None- Connect if not already connected
Input Actions
tap(x: int, y: int) -> None- Tap at absolute pixel coordinatesswipe(x1: int, y1: int, x2: int, y2: int, duration_ms: float = 1000) -> None- Swipe gesturedrag(x1: int, y1: int, x2: int, y2: int, duration: float = 3.0) -> None- Drag gestureinput_text(text: str, clear: bool = False) -> bool- Text input into focused fieldpress_key(keycode: int) -> None- Key press
App Management
start_app(package: str, activity: str | None = None) -> str- Launch appinstall_app(path: str, **kwargs) -> str- Install applist_packages(include_system: bool = False) -> List[str]- List packagesget_apps(include_system: bool = True) -> List[Dict[str, str]]- Get apps with labels
State / Observation
screenshot(hide_overlay: bool = True) -> bytes- Capture screen as PNG bytesget_ui_tree() -> Dict[str, Any]- Get raw UI / accessibility treeget_date() -> str- Get device date/time
StateProvider
AndroidStateProvider
driver.get_ui_tree(). Includes retry logic (3 attempts). Applies tree filters and formatters to produce a UIState snapshot.
UIState
get_element(index: int) -> Dict | None- Recursively find an element by its indexget_element_coords(index: int) -> Tuple[int, int]- Return the centre (x, y) of an element. RaisesValueErrorwhen element is missing or has no bounds.get_element_info(index: int) -> Dict- Return element metadata (text, className, type, child_texts)
elements- List of parsed UI elementsformatted_text- Formatted text representation of the UI treefocused_text- Text of the currently focused elementphone_state- Dict with current activity, keyboard visibility, etc.screen_width/screen_height- Device screen dimensions
ActionContext
driver-DeviceDriverinstance for raw device I/Oui-UIStateinstance for element resolution (refreshed each step)shared_state-DroidAgentStatefor shared agent statestate_provider-StateProviderfor fetching fresh UI stateapp_opener_llm- LLM instance for app opening workflow (optional)credential_manager- CredentialManager instance (optional)streaming- Whether streaming is enabled
ActionResult
summary field is what the agent sees.
Action Functions
Action functions live inagent/utils/actions.py and follow this pattern:
click(index)- Click UI element by indexclick_at(x, y)- Click at screen coordinatesclick_area(area)- Click a named screen arealong_press(index)- Long press UI element by indexlong_press_at(x, y)- Long press at screen coordinatestype(text, index)- Input text into elementtype_secret(secret_id)- Input a credential secretswipe(coordinate, coordinate2)- Swipe gesturesystem_button(button)- Press system buttons (back, home, enter)open_app(text)- Open app by namewait(seconds)- Wait for a durationremember(information)- Store info in agent memorycomplete(success, reason)- Mark task as finishedget_state()- Get accessibility tree + phone statetake_screenshot()- Capture device screen
ToolRegistry
register(name, fn, params, description)- Register a single toolregister_from_dict(tools_dict)- Register tools from{"name": {"parameters": {...}, "description": "...", "function": callable}}formatdisable(tool_names)- Remove tools by name
Custom Tool Integration
Adding Custom Tools
Platform Comparison
| Feature | AndroidDriver | IOSDriver |
|---|---|---|
| Connection | ADB + Portal (USB/TCP) | HTTP (Portal app) |
| tap | Absolute coordinates | Absolute coordinates |
| swipe | Coordinate-based | Direction-based |
| drag | Declared but not yet implemented | Not supported |
| input_text | With clear support | No clear support |
| press_key | Full Android keycodes | HOME only (BACK/ENTER unsupported) |
| screenshot | PNG via Portal | PNG via HTTP |
| get_ui_tree | Accessibility tree + phone state | Accessibility tree + phone state |
| get_date | Via ADB shell | Not available (returns empty) |
| get_apps | Full packages with labels | Bundle identifiers only |
Best Practices
1. Check supported methods before calling
2. Use ActionContext for agent-level interactions
3. Use StateProvider for UI state
Error Handling
Driver methods use consistent error handling: Raises NotImplementedError:See Also
- AndroidDriver API - Android driver implementation
- IOSDriver API - iOS driver implementation
- DroidAgent API - Agent integration
- Configuration - Configuration reference

