Using Droidrun with Gemini
This guide explains how to use the Droidrun framework with Gemini, Google’s family of advanced large language models. By integrating Gemini with Droidrun, you can leverage powerful cloud-based LLMs to automate Android devices, build intelligent agents, and experiment with advanced workflows.
What is Gemini?
Gemini is a suite of state-of-the-art language models developed by Google, available via the Gemini API. It supports text, code, and multimodal (vision) capabilities, and is accessible through a simple HTTP API.
Why Use Gemini with Droidrun?
- Accuracy: Access to Google’s latest, high-quality LLMs.
- Multimodal: Supports both text and vision (image) inputs.
- Scalability: Cloud-based, no local hardware requirements.
Prerequisites
- Google Cloud account with access to the Gemini API.
- Python 3.10+
- droidrun framework installed (see Droidrun Quickstart).
Make sure you’ve set up and enabled the Droidrun Portal.
1. Set Up Gemini API Access
- Go to the Gemini API Console and create an API key.
- Save your API key securely. You will use it in your Python code.
2. Install Required Python Packages
3. Example: Using Droidrun with Gemini LLM
Here is a minimal example of using Droidrun with Gemini as the LLM backend:
4. Troubleshooting
- Invalid API key: Double-check your Gemini API key and permissions.
- Model not found: Use the correct model name, e.g.,
"gemini-2.5-flash"
or"gemini-2.5-pro"
. - Quota exceeded: Check your Google Cloud usage and quotas.
- Connection errors: Ensure your network allows outbound HTTPS requests to the Gemini API.
5. Tips
- For advanced configuration, see the DroidAgent documentation and Gemini API docs.
- Store your API key securely (e.g., use environment variables or a secrets manager).
With this setup, you can harness the power of Google’s Gemini models for Android automation and agent-based workflows using Droidrun!