What is Gemini?
Gemini is a suite of state-of-the-art language models developed by Google, available via the Gemini API. It supports text, code, and multimodal (vision) capabilities, and is accessible through a simple HTTP API.Why Use Gemini with Droidrun?
- Accuracy: Access to Google’s latest, high-quality LLMs.
- Multimodal: Supports both text and vision (image) inputs.
- Scalability: Cloud-based, no local hardware requirements.
Prerequisites
- Google Cloud account with access to the Gemini API.
- Python 3.10+
- droidrun framework installed (see Droidrun Quickstart).
Make sure you’ve set up and enabled the Droidrun Portal.
1. Set Up Gemini API Access
- Go to the Gemini API Console and create an API key.
- Save your API key securely. You will use it in your Python code.
2. Install Required Python Packages
3. Example: Using Droidrun with Gemini LLM
Here is a minimal example of using Droidrun with Gemini as the LLM backend:4. Troubleshooting
- Invalid API key: Double-check your Gemini API key and permissions.
- Model not found: Use the correct model name, e.g.,
"gemini-2.5-flash"
or"gemini-2.5-pro"
. - Quota exceeded: Check your Google Cloud usage and quotas.
- Connection errors: Ensure your network allows outbound HTTPS requests to the Gemini API.
5. Tips
- For advanced configuration, see the DroidAgent documentation and Gemini API docs.
- Store your API key securely (e.g., use environment variables or a secrets manager).
With this setup, you can harness the power of Google’s Gemini models for Android automation and agent-based workflows using Droidrun!