This guide explains how to use the Droidrun framework with Gemini, Google’s family of advanced large language models. By integrating Gemini with Droidrun, you can leverage powerful cloud-based LLMs to automate Android devices, build intelligent agents, and experiment with advanced workflows.
Gemini is a suite of state-of-the-art language models developed by Google, available via the Gemini API. It supports text, code, and multimodal (vision) capabilities, and is accessible through a simple HTTP API.
Here is a minimal example of using Droidrun with Gemini as the LLM backend:
Copy
Ask AI
import asynciofrom llama_index.llms.google_genai import GoogleGenAIfrom droidrun import DroidAgent, AdbToolsasync def main(): # load adb tools for the first connected device tools = AdbTools() # Set up the Gemini LLM llm = GoogleGenAI( api_key="YOUR_GEMINI_API_KEY", # Replace with your Gemini API key model="gemini-2.5-flash", # or "gemini-2.5-pro" for enhanced reasoning ) # Create the DroidAgent agent = DroidAgent( goal="Open Settings and check battery level", llm=llm, tools=tools, vision=True, # Set to True for vision models, False for text-only reasoning=False, # Optional: enable planning/reasoning ) # Run the agent result = await agent.run() print(f"Success: {result['success']}") if result.get('output'): print(f"Output: {result['output']}")if __name__ == "__main__": asyncio.run(main())