Skip to main content
UI Actions - Core UI interaction tools for Android device control via ADB.

AdbTools

class AdbTools(Tools)
Core UI interaction tools for Android device control. AdbTools provides a comprehensive interface for interacting with Android devices through ADB (Android Debug Bridge). It supports both TCP communication and content provider modes for device communication via the Droidrun Portal app.

AdbTools.__init__

def __init__(
    serial: str | None = None,
    use_tcp: bool = False,
    remote_tcp_port: int = 8080,
    app_opener_llm=None,
    text_manipulator_llm=None,
    credential_manager=None
) -> None
Initialize the AdbTools instance. Arguments:
  • serial str | None - Device serial number (e.g., “emulator-5554”, “192.168.1.100:5555”). If None, auto-detects the first available device.
  • use_tcp bool - Whether to prefer TCP communication (default: False). TCP is faster but requires port forwarding. Falls back to content provider mode if TCP fails.
  • remote_tcp_port int - TCP port for Portal app communication on device (default: 8080)
  • app_opener_llm LLM | None - LLM instance for app opening workflow (optional). Used by helper tools to open apps by natural language description.
  • text_manipulator_llm LLM | None - LLM instance for text manipulation (optional). Used by helper tools for text editing operations.
  • credential_manager CredentialManager | None - CredentialManager instance for secret handling (optional). Enables secure credential access in automation workflows.
Usage:
from droidrun.tools import AdbTools

# Auto-detect device
tools = AdbTools()

# Specific device
tools = AdbTools(serial="emulator-5554")

# TCP mode (faster communication, requires port forwarding)
tools = AdbTools(serial="emulator-5554", use_tcp=True)

# With LLM support for advanced workflows
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4")
tools = AdbTools(
    serial="emulator-5554",
    app_opener_llm=llm,
    text_manipulator_llm=llm
)
Notes:
  • Automatically sets up the Droidrun Portal keyboard on initialization via setup_keyboard()
  • Creates a PortalClient instance that handles TCP/content provider communication
  • Device serial can be emulator name, USB serial, or TCP/IP address:port

UI Interaction Methods

AdbTools.tap_by_index

def tap_by_index(index: int) -> str
Tap on a UI element by its index. This function uses the cached clickable elements to find the element with the given index and tap on its center coordinates. Arguments:
  • index int - Index of the element to tap (from accessibility tree)
Returns:
  • str - Result message describing the tapped element
Usage:
# Get UI state to populate element cache
state = tools.get_state()

# Tap element at index 5
result = tools.tap_by_index(5)
print(result)
# Output: "Tapped element with index 5 | Text: 'Submit' | Class: android.widget.Button | Type: clickable | Coordinates: (540, 960)"
Notes:
  • Call get_state() first to populate the clickable elements cache
  • Returns descriptive error message (not raises exception) if element index is invalid
  • Error message includes up to 20 available indices to help with debugging
  • Automatically searches nested children for the target index
  • Returns detailed information about tapped element including text, class, type, and child content

AdbTools.tap_by_coordinates

def tap_by_coordinates(x: int, y: int) -> bool
Tap on the device screen at specific coordinates. Arguments:
  • x int - X coordinate
  • y int - Y coordinate
Returns:
  • bool - True if tap succeeded, False otherwise
Usage:
# Tap at specific screen coordinates
success = tools.tap_by_coordinates(540, 960)

AdbTools.tap

def tap(index: int) -> str
Tap on a UI element by its index. Alias for tap_by_index(). This function uses the cached clickable elements from the last get_state() call to find the element with the given index and tap on its center coordinates. Arguments:
  • index int - Index of the element to tap
Returns:
  • str - Result message

AdbTools.swipe

def swipe(
    start_x: int,
    start_y: int,
    end_x: int,
    end_y: int,
    duration_ms: float = 300
) -> bool
Performs a straight-line swipe gesture on the device screen. To perform a hold (long press), set the start and end coordinates to the same values and increase the duration as needed. Arguments:
  • start_x int - Starting X coordinate
  • start_y int - Starting Y coordinate
  • end_x int - Ending X coordinate
  • end_y int - Ending Y coordinate
  • duration_ms float - Duration of swipe in milliseconds (default: 300)
Returns:
  • bool - True if swipe succeeded, False otherwise
Usage:
# Swipe up (scroll down content)
tools.swipe(540, 1500, 540, 500, duration_ms=300)

# Swipe left
tools.swipe(800, 960, 200, 960, duration_ms=250)

# Long press (hold for 2 seconds at same position)
tools.swipe(540, 960, 540, 960, duration_ms=2000)
Notes:
  • Emits SwipeActionEvent when context is set for trajectory tracking
  • Uses @Tools.ui_action decorator for automatic screenshot capture
  • Duration is converted to seconds internally (dividing by 1000)

AdbTools.drag

def drag(
    start_x: int,
    start_y: int,
    end_x: int,
    end_y: int,
    duration: float = 3
) -> bool
Performs a straight-line drag and drop gesture on the device screen. Arguments:
  • start_x int - Starting X coordinate
  • start_y int - Starting Y coordinate
  • end_x int - Ending X coordinate
  • end_y int - Ending Y coordinate
  • duration float - Duration of drag in seconds (default: 3)
Returns:
  • bool - True if drag succeeded, False otherwise
Usage:
# Drag element from one position to another (3 second duration)
tools.drag(200, 500, 800, 1200, duration=3)

# Faster drag (1 second)
tools.drag(200, 500, 800, 1200, duration=1)
Notes:
  • Emits DragActionEvent when context is set for trajectory tracking
  • Uses @Tools.ui_action decorator for automatic screenshot capture
  • Includes sleep after drag operation to allow UI to settle

AdbTools.input_text

def input_text(text: str, index: int = -1, clear: bool = False) -> str
Input text on the device. Always make sure that a text field is focused before inputting text. Arguments:
  • text str - Text to input. Can contain spaces, newlines, and special characters including non-ASCII.
  • index int - Index of the element to input text into. If -1, uses the currently focused element (default: -1).
  • clear bool - Whether to clear existing text before inputting (default: False)
Returns:
  • str - Result message
Usage:
# Focus element first, then input text
tools.tap_by_index(3)  # Focus text field
result = tools.input_text("Hello World")

# Input into specific element by index
result = tools.input_text("user@example.com", index=5)

# Clear existing text and input new text
result = tools.input_text("New text", index=5, clear=True)

# Unicode support
result = tools.input_text("你好世界")  # Chinese characters
result = tools.input_text("Hello\nWorld")  # Multiline text
Notes:
  • Always ensure a text field is focused before inputting text (use tap_by_index() or set index parameter)
  • Uses the Droidrun Portal app keyboard for reliable text input via PortalClient
  • Supports Unicode characters and special characters including non-ASCII
  • If index != -1, automatically taps the element first before inputting text
  • Call get_state() first to populate element cache if using index parameter
  • Emits InputTextActionEvent when context is set for trajectory tracking
  • Uses @Tools.ui_action decorator for automatic screenshot capture
  • Text longer than 50 characters is truncated in result message (but fully input to device)

AdbTools.back

def back() -> str
Go back on the current view. This presses the Android back button (keycode 4). Returns:
  • str - Result message
Usage:
result = tools.back()  # Press back button
print(result)  # Output: "Pressed key BACK"
Notes:
  • Uses Android keycode 4 (KEYCODE_BACK)
  • Emits KeyPressActionEvent when context is set for trajectory tracking
  • Uses @Tools.ui_action decorator for automatic screenshot capture

AdbTools.press_key

def press_key(keycode: int) -> str
Press a key on the Android device. Common keycodes:
  • 3: HOME
  • 4: BACK
  • 66: ENTER
  • 67: DELETE
Full keycode reference: Android KeyEvent Documentation Arguments:
  • keycode int - Android keycode to press
Returns:
  • str - Result message with key name
Usage:
# Press enter
result = tools.press_key(66)
print(result)  # Output: "Pressed key ENTER"

# Press home button
tools.press_key(3)  # Output: "Pressed key HOME"

# Press back button
tools.press_key(4)  # Output: "Pressed key BACK"

# Press delete
tools.press_key(67)  # Output: "Pressed key DELETE"

# Unknown keycodes display as number
tools.press_key(999)  # Output: "Pressed key 999"
Notes:
  • Common keycodes (3, 4, 66, 67) are mapped to readable names (HOME, BACK, ENTER, DELETE)
  • Emits KeyPressActionEvent when context is set for trajectory tracking
  • Uses @Tools.ui_action decorator for automatic screenshot capture

App Management Methods

AdbTools.start_app

def start_app(package: str, activity: str | None = None) -> str
Start an app on the device. If activity is not provided, automatically resolves the main/launcher activity using cmd package resolve-activity. Arguments:
  • package str - Package name (e.g., “com.android.settings”, “com.google.android.apps.messaging”)
  • activity str | None - Optional activity name (e.g., ”.Settings”). If None, auto-detects the main launcher activity.
Returns:
  • str - Result message indicating success or error
Usage:
# Auto-detect main activity
result = tools.start_app("com.android.settings")
print(result)
# Output: "App started: com.android.settings with activity .Settings"

# Specific activity
result = tools.start_app("com.android.settings", ".Settings")

# Chrome browser (auto-detects launcher activity)
result = tools.start_app("com.android.chrome")
Notes:
  • Uses cmd package resolve-activity --brief to auto-detect main activity when not specified
  • Emits StartAppEvent when context is set for trajectory tracking

AdbTools.install_app

def install_app(
    apk_path: str,
    reinstall: bool = False,
    grant_permissions: bool = True
) -> str
Install an app on the device. Arguments:
  • apk_path str - Path to the APK file on the local machine
  • reinstall bool - Whether to reinstall if app already exists (default: False)
  • grant_permissions bool - Whether to grant all permissions automatically (default: True)
Returns:
  • str - Result message indicating success or error
Usage:
# Install new app
result = tools.install_app("/path/to/app.apk")
print(result)

# Reinstall existing app
result = tools.install_app("/path/to/app.apk", reinstall=True)

# Install without granting permissions
result = tools.install_app("/path/to/app.apk", grant_permissions=False)
Notes:
  • APK file must exist on the local machine (not on device)
  • Returns error message if APK file is not found
  • With grant_permissions=True, automatically grants runtime permissions via -g flag

AdbTools.list_packages

def list_packages(include_system_apps: bool = False) -> List[str]
List installed packages on the device. Arguments:
  • include_system_apps bool - Whether to include system apps (default: False)
Returns:
  • List[str] - List of package names
Usage:
# User-installed apps only
packages = tools.list_packages()
print(packages)
# Output: ['com.example.app1', 'com.example.app2', ...]

# Include system apps
all_packages = tools.list_packages(include_system_apps=True)

AdbTools.get_apps

def get_apps(include_system: bool = True) -> List[Dict[str, str]]
Get installed apps with package name and human-readable label. Arguments:
  • include_system bool - Whether to include system apps (default: True)
Returns:
  • List[Dict[str, str]] - List of dictionaries containing ‘package’ and ‘label’ keys
Usage:
apps = tools.get_apps(include_system=False)
for app in apps:
    print(f"{app['label']}: {app['package']}")

# Output:
# Chrome: com.android.chrome
# Gmail: com.google.android.gm
# Messages: com.google.android.apps.messaging

State and Screenshot Methods

AdbTools.get_state

def get_state() -> Dict[str, Any]
Get both the accessibility tree and phone state in a single call. This is the primary method for retrieving UI information from the device. It combines accessibility tree (UI elements) and phone state (current activity, keyboard visibility) into a single response. This method also populates the internal clickable_elements_cache used by tap_by_index(). Returns:
  • Dict[str, Any] - Dictionary containing both ‘a11y_tree’ and ‘phone_state’ data:
    • a11y_tree: List of UI elements with indices, text, class names, bounds, etc.
    • phone_state: Current activity name, keyboard visibility, etc.
Usage:
state = tools.get_state()

# Access accessibility tree
for element in state['a11y_tree']:
    print(f"Index {element['index']}: {element['text']} ({element['className']})")

# Access phone state
current_activity = state['phone_state']['current_activity']
keyboard_shown = state['phone_state']['keyboard_shown']

print(f"Current activity: {current_activity}")
print(f"Keyboard visible: {keyboard_shown}")
Element structure:
{
    "index": 5,
    "className": "android.widget.Button",
    "text": "Submit",
    "bounds": "100,200,300,400",  # left,top,right,bottom
    "clickable": True,
    "children": []  # Nested elements (if any)
}
Phone state structure:
{
    "current_activity": "com.android.chrome/.MainActivity",
    "keyboard_shown": False
}
Notes:
  • Always call this method before using tap_by_index() to populate the element cache
  • The type attribute is filtered out from elements in the returned tree
  • Uses PortalClient which automatically selects TCP or content provider mode

AdbTools.take_screenshot

def take_screenshot(hide_overlay: bool = True) -> Tuple[str, bytes]
Take a screenshot of the device. This function captures the current screen and stores the screenshot with timestamp for trajectory recording. Screenshots are automatically stored in the screenshots list with timestamp information. Arguments:
  • hide_overlay bool - Whether to hide Portal app overlay elements during screenshot (default: True)
Returns:
  • Tuple[str, bytes] - Tuple of (format, image_bytes) where format is “PNG” and image_bytes is the PNG image data
Usage:
# Take screenshot (hides Portal overlay by default)
format, image_bytes = tools.take_screenshot()
print(f"Screenshot format: {format}")  # Output: "PNG"

# Save to file
with open("screenshot.png", "wb") as f:
    f.write(image_bytes)

# Take screenshot with overlay visible
format, image_bytes = tools.take_screenshot(hide_overlay=False)
Notes:
  • Screenshots are automatically stored in tools.screenshots list with timestamp
  • Each screenshot entry contains: {"timestamp": float, "image_data": bytes, "format": "PNG"}
  • Uses PortalClient for screenshot capture

AdbTools.get_date

def get_date() -> str
Get the current date and time on device. Returns:
  • str - Date and time string from device
Usage:
date = tools.get_date()
print(f"Device date: {date}")
# Output: "Thu Jan 16 14:30:25 UTC 2025"

Device Communication Methods

AdbTools.ping

def ping() -> Dict[str, Any]
Test the Portal connection. Returns:
  • Dict[str, Any] - Dictionary with ping result
Usage:
result = tools.ping()
if result.get("status") == "ok":
    print("Portal connection successful")
else:
    print(f"Portal connection failed: {result}")

Memory and Completion Methods

AdbTools.remember

def remember(information: str) -> str
Store important information to remember for future context. This information will be extracted and included in future agent steps to maintain context across interactions. Use this for critical facts, observations, or user preferences that should influence future decisions. Arguments:
  • information str - The information to remember
Returns:
  • str - Confirmation message
Usage:
# Remember user preferences
tools.remember("User prefers dark mode")

# Remember important state
tools.remember("Flight booking confirmation code: ABC123")

# Remember task progress
tools.remember("Already sent email to john@example.com")
Notes:
  • Memory is limited to 10 most recent items
  • Memory persists for the duration of the agent’s execution
  • Memory is accessible via get_memory() or automatically included in agent context

AdbTools.get_memory

def get_memory() -> List[str]
Retrieve all stored memory items. Returns:
  • List[str] - List of stored memory items
Usage:
memory = tools.get_memory()
for item in memory:
    print(f"- {item}")

AdbTools.complete

def complete(success: bool, reason: str = "")
Mark the task as finished. Arguments:
  • success bool - Indicates if the task was successful
  • reason str - Reason for failure/success (optional for success, required if success=False)
Usage:
# Success
tools.complete(success=True, reason="Successfully sent message to John")

# Failure
tools.complete(success=False, reason="Could not find contact 'John' in contacts app")
Notes:
  • This sets internal flags (finished, success, reason) used by agents to determine completion
  • If success=False, reason is required (raises ValueError if not provided)
  • If success=True and no reason provided, defaults to “Task completed successfully.”
  • Uses @Tools.ui_action decorator for automatic screenshot capture
  • This does not terminate execution, it only sets completion flags

Properties

Instance variables:
  • device - ADB device instance (from adbutils)
  • portal - PortalClient instance for device communication (TCP or content provider mode)
  • clickable_elements_cache - List of cached UI elements from last get_state() call
  • memory - List of remembered information items (max 10)
  • screenshots - List of captured screenshots with timestamps (format: [{"timestamp": float, "image_data": bytes, "format": "PNG"}])
  • save_trajectories - Trajectory saving level: “none”, “step”, or “action”
  • finished - Boolean indicating if task is complete (set by complete())
  • success - Boolean indicating if task succeeded (set by complete())
  • reason - String describing success/failure reason (set by complete())
  • app_opener_llm - LLM instance for app opening workflow (optional)
  • text_manipulator_llm - LLM instance for text manipulation (optional)
  • credential_manager - CredentialManager instance for secret handling (optional)

Notes

  • Portal app required: The Droidrun Portal app must be installed and accessibility service enabled on the device
  • TCP vs Content Provider: TCP is faster but requires port forwarding (adb forward tcp:8080 tcp:8080). Content provider is the fallback mode using ADB shell commands.
  • Element caching: Always call get_state() before using tap_by_index() or tap() to populate the element cache
  • Trajectory recording: When save_trajectories="action", screenshots and UI states are automatically captured for each UI action via the @Tools.ui_action decorator
  • Unicode support: input_text() supports Unicode characters and special characters via the Portal app’s custom keyboard
  • Event streaming: When a context is set via _set_context(), action events (TapActionEvent, SwipeActionEvent, etc.) are emitted for trajectory tracking
  • Decorator behavior: Methods decorated with @Tools.ui_action automatically capture screenshots and emit events when trajectory recording is enabled

Example Workflow

from droidrun.tools import AdbTools

# Initialize tools (auto-detects device if serial not provided)
tools = AdbTools(serial="emulator-5554", use_tcp=True)

# Check Portal connection
ping_result = tools.ping()
print(f"Portal status: {ping_result}")

# Start Chrome app (auto-detects main activity)
result = tools.start_app("com.android.chrome")
print(result)

# Get UI state (populates clickable_elements_cache)
state = tools.get_state()
print(f"Current activity: {state['phone_state']['current_activity']}")
print(f"Keyboard shown: {state['phone_state']['keyboard_shown']}")

# Find and tap search bar by iterating through elements
for element in state['a11y_tree']:
    element_text = element.get('text', '').lower()
    if 'search' in element_text or 'address' in element_text:
        result = tools.tap_by_index(element['index'])
        print(result)
        break

# Input search query
result = tools.input_text("Droidrun framework")
print(result)

# Press enter key
result = tools.press_key(66)
print(result)

# Take screenshot
format, screenshot = tools.take_screenshot()
print(f"Screenshot format: {format}, size: {len(screenshot)} bytes")
with open("search_result.png", "wb") as f:
    f.write(screenshot)

# Remember result for future context
tools.remember("Searched for Droidrun framework in Chrome")

# Complete task
tools.complete(success=True, reason="Successfully searched for Droidrun in Chrome")

# Check completion status
print(f"Task finished: {tools.finished}")
print(f"Task success: {tools.success}")
print(f"Reason: {tools.reason}")
I