Ollama Integration with Google Agent Development Kit
While the Google Agent Development Kit (ADK) is designed to work seamlessly with Google’s own large language models (LLMs), real-world scenarios often require the flexibility to integrate with other models. This is where the ADK’s LiteLlm component becomes invaluable, allowing you to connect to third-party LLMs, including those hosted locally with Ollama.
This guide demonstrates how to set up and integrate a local Ollama server and model with the Google ADK.
Setting Up a Local Ollama Server
First, you need to have Ollama running on your local machine. You can download and install it from the official Ollama website.

Once installed, Ollama provides a command-line interface (CLI) and a simple web UI to manage and run models. You can easily download an LLM by asking a question, and Ollama will prompt you to select and download a model from its library. For this example, we will use the gemma3:1b model due to its smaller size, making it ideal for running on a CPU with limited RAM.

You can verify that a model has been successfully installed by using the ollama list command in your terminal:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | C:\Users\Administrator>ollamaUsage: ollama [flags] ollama [command]Available Commands: serve Start ollama create Create a model show Show information for a model run Run a model stop Stop a running model pull Pull a model from a registry push Push a model to a registry list List models ps List running models cp Copy a model rm Remove a model help Help about any commandFlags: -h, --help help for ollama -v, --version Show version informationUse "ollama [command] --help" for more information about a command.C:\Users\Administrator>ollama listNAME ID SIZE MODIFIEDgemma3:1b 8648f39daa8f 815 MB About an hour ago |
This confirms that the gemma3:1b model is ready to be used.
Creating an Agent with a Local LLM
Now, let’s create a Python script that connects the Google ADK to our local Ollama model.
Here is the complete script, litellmagent.py:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | import warningsimport asyncio# The following warning is a known issue in the ADK library and can be safely ignored.# It occurs because the `SequentialAgent` class in ADK re-defines a field# that is already present in its parent `BaseAgent` class.warnings.filterwarnings( "ignore", message='Field name "config_type" in "SequentialAgent" shadows an attribute in parent "BaseAgent"', category=UserWarning, module="pydantic._internal._fields",)from google.adk.agents import Agentfrom google.adk.models.lite_llm import LiteLlmfrom google.adk.runners import Runnerfrom google.adk.sessions import InMemorySessionServicefrom google.genai import types# Define the Ollama model to be used.# The format is "ollama_chat/model_name"# Ensure the model is running on your local machine.ollama_model = LiteLlm(model="ollama_chat/gemma3:1b")# Create a simple agent using the Ollama model.# You can give your agent a name and instructions.ollama_agent = Agent( name="LocalOllamaAgent", model=ollama_model, instruction="You are a helpful assistant that uses a local Ollama model to answer questions.")async def main(): """Sets up the runner and session to interact with the agent.""" # Define session details app_name = "ollama_app" user_id = "user1" session_id = "session1" # Create a session service and a runner session_service = InMemorySessionService() await session_service.create_session( app_name=app_name, user_id=user_id, session_id=session_id ) runner = Runner(agent=ollama_agent, app_name=app_name, session_service=session_service) # Prepare the user's message query = "What's the capital of France?" content = types.Content(role="user", parts=[types.Part(text=query)]) print(f"User: {query}") # Run the agent asynchronously and get the response final_response = "Agent did not produce a final response." async for event in runner.run_async( user_id=user_id, session_id=session_id, new_message=content ): if event.is_final_response() and event.content and event.content.parts: final_response = event.content.parts[0].text.strip() break # Stop after getting the final response # Print the agent's response. print(f"Agent: {final_response}")# Execute the main async functionif __name__ == "__main__": try: asyncio.run(main()) except Exception as e: print(f"\nAn error occurred: {e}") print("Please ensure your local Ollama server is running and the 'gemma3:1b' model is available.") |
When you run this script, it connects to your local Ollama server, sends the query “What’s the capital of France?”, and prints the agent’s response.
1 2 3 | (.venv) C:\vscode-python-workspace\adkagent>python litellmagent.pyUser: What's the capital of France?Agent: The capital of France is Paris. |
How the Script Works
This script demonstrates the flexibility of the Google ADK by showing how to swap a cloud-based LLM for a locally hosted one.
- Imports and Warning Suppression: The script imports essential components from
google.adkandgoogle.genaito build and run the agent. It also includes awarnings.filterwarningscall to ignore a benign warning from the underlyingPydanticlibrary, keeping the console output clean. - Model Configuration (
LiteLlm): The core of the local integration is theLiteLlminstance. The lineollama_model = LiteLlm(model="ollama_chat/gemma3:1b")tells the ADK to connect to a local model. Theollama_chat/prefix specifies that the script should use the Ollama chat completion API, andgemma3:1bis the specific model to be used. - Agent Definition: A standard ADK
Agentis created. Critically, themodelparameter is set to theollama_modelinstance, which seamlessly plugs the local model into the agent’s reasoning engine. - Asynchronous
mainFunction: This function handles the setup and execution of the agent. It defines session details, initializes aRunnerto execute the agent’s logic, and prepares the user’s message in the ADK’stypes.Contentformat. - Execution and Error Handling: The
runner.run_async(...)method sends the message to the agent and returns a stream of events. The script iterates through these events, waiting for the final response. The includedtry...exceptblock provides a helpful error message if the local Ollama server isn’t running, guiding the user to troubleshoot the issue.
This example showcases how the ADK’s design allows for easy integration with a variety of LLM providers, giving developers the freedom to choose the best model for their application, whether it’s a powerful cloud-based service or a locally hosted solution for privacy, cost, or performance reasons.
No comments:
Post a Comment