Ollama Integration with Google Agent Development Kit
While the Google Agent Development Kit (ADK) is designed to work seamlessly with Google’s own large language models (LLMs), real-world scenarios often require the flexibility to integrate with other models. This is where the ADK’s LiteLlm
component becomes invaluable, allowing you to connect to third-party LLMs, including those hosted locally with Ollama.
This guide demonstrates how to set up and integrate a local Ollama server and model with the Google ADK.
Setting Up a Local Ollama Server
First, you need to have Ollama running on your local machine. You can download and install it from the official Ollama website.

Once installed, Ollama provides a command-line interface (CLI) and a simple web UI to manage and run models. You can easily download an LLM by asking a question, and Ollama will prompt you to select and download a model from its library. For this example, we will use the gemma3:1b model due to its smaller size, making it ideal for running on a CPU with limited RAM.

You can verify that a model has been successfully installed by using the ollama list
command in your terminal:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | C:\Users\Administrator>ollama Usage: ollama [flags] ollama [command] Available Commands: serve Start ollama create Create a model show Show information for a model run Run a model stop Stop a running model pull Pull a model from a registry push Push a model to a registry list List models ps List running models cp Copy a model rm Remove a model help Help about any command Flags: -h, --help help for ollama -v, --version Show version information Use "ollama [command] --help" for more information about a command. C:\Users\Administrator>ollama list NAME ID SIZE MODIFIED gemma3:1b 8648f39daa8f 815 MB About an hour ago |
This confirms that the gemma3:1b
model is ready to be used.
Creating an Agent with a Local LLM
Now, let’s create a Python script that connects the Google ADK to our local Ollama model.
Here is the complete script, litellmagent.py
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | import warnings import asyncio # The following warning is a known issue in the ADK library and can be safely ignored. # It occurs because the `SequentialAgent` class in ADK re-defines a field # that is already present in its parent `BaseAgent` class. warnings.filterwarnings( "ignore", message='Field name "config_type" in "SequentialAgent" shadows an attribute in parent "BaseAgent"', category=UserWarning, module="pydantic._internal._fields", ) from google.adk.agents import Agent from google.adk.models.lite_llm import LiteLlm from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.genai import types # Define the Ollama model to be used. # The format is "ollama_chat/model_name" # Ensure the model is running on your local machine. ollama_model = LiteLlm(model="ollama_chat/gemma3:1b") # Create a simple agent using the Ollama model. # You can give your agent a name and instructions. ollama_agent = Agent( name="LocalOllamaAgent", model=ollama_model, instruction="You are a helpful assistant that uses a local Ollama model to answer questions." ) async def main(): """Sets up the runner and session to interact with the agent.""" # Define session details app_name = "ollama_app" user_id = "user1" session_id = "session1" # Create a session service and a runner session_service = InMemorySessionService() await session_service.create_session( app_name=app_name, user_id=user_id, session_id=session_id ) runner = Runner(agent=ollama_agent, app_name=app_name, session_service=session_service) # Prepare the user's message query = "What's the capital of France?" content = types.Content(role="user", parts=[types.Part(text=query)]) print(f"User: {query}") # Run the agent asynchronously and get the response final_response = "Agent did not produce a final response." async for event in runner.run_async( user_id=user_id, session_id=session_id, new_message=content ): if event.is_final_response() and event.content and event.content.parts: final_response = event.content.parts[0].text.strip() break # Stop after getting the final response # Print the agent's response. print(f"Agent: {final_response}") # Execute the main async function if __name__ == "__main__": try: asyncio.run(main()) except Exception as e: print(f"\nAn error occurred: {e}") print("Please ensure your local Ollama server is running and the 'gemma3:1b' model is available.") |
When you run this script, it connects to your local Ollama server, sends the query “What’s the capital of France?”, and prints the agent’s response.
1 2 3 | (.venv) C:\vscode-python-workspace\adkagent>python litellmagent.py User: What's the capital of France? Agent: The capital of France is Paris. |
How the Script Works
This script demonstrates the flexibility of the Google ADK by showing how to swap a cloud-based LLM for a locally hosted one.
- Imports and Warning Suppression: The script imports essential components from
google.adk
andgoogle.genai
to build and run the agent. It also includes awarnings.filterwarnings
call to ignore a benign warning from the underlyingPydantic
library, keeping the console output clean. - Model Configuration (
LiteLlm
): The core of the local integration is theLiteLlm
instance. The lineollama_model = LiteLlm(model="ollama_chat/gemma3:1b")
tells the ADK to connect to a local model. Theollama_chat/
prefix specifies that the script should use the Ollama chat completion API, andgemma3:1b
is the specific model to be used. - Agent Definition: A standard ADK
Agent
is created. Critically, themodel
parameter is set to theollama_model
instance, which seamlessly plugs the local model into the agent’s reasoning engine. - Asynchronous
main
Function: This function handles the setup and execution of the agent. It defines session details, initializes aRunner
to execute the agent’s logic, and prepares the user’s message in the ADK’stypes.Content
format. - Execution and Error Handling: The
runner.run_async(...)
method sends the message to the agent and returns a stream of events. The script iterates through these events, waiting for the final response. The includedtry...except
block provides a helpful error message if the local Ollama server isn’t running, guiding the user to troubleshoot the issue.
This example showcases how the ADK’s design allows for easy integration with a variety of LLM providers, giving developers the freedom to choose the best model for their application, whether it’s a powerful cloud-based service or a locally hosted solution for privacy, cost, or performance reasons.