Monday, September 08, 2025

Local Ollama Server Integration with Google Agent Development Kit

 

Ollama Integration with Google Agent Development Kit

While the Google Agent Development Kit (ADK) is designed to work seamlessly with Google’s own large language models (LLMs), real-world scenarios often require the flexibility to integrate with other models. This is where the ADK’s LiteLlm component becomes invaluable, allowing you to connect to third-party LLMs, including those hosted locally with Ollama.

This guide demonstrates how to set up and integrate a local Ollama server and model with the Google ADK.

Setting Up a Local Ollama Server

First, you need to have Ollama running on your local machine. You can download and install it from the official Ollama website.

Once installed, Ollama provides a command-line interface (CLI) and a simple web UI to manage and run models. You can easily download an LLM by asking a question, and Ollama will prompt you to select and download a model from its library. For this example, we will use the gemma3:1b model due to its smaller size, making it ideal for running on a CPU with limited RAM.

You can verify that a model has been successfully installed by using the ollama list command in your terminal:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
C:\Users\Administrator>ollama
Usage:
  ollama [flags]
  ollama [command]
 
Available Commands:
  serve       Start ollama
  create      Create a model
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command
 
Flags:
  -h, --help      help for ollama
  -v, --version   Show version information
 
Use "ollama [command] --help" for more information about a command.
 
C:\Users\Administrator>ollama list
NAME          ID             SIZE      MODIFIED
gemma3:1b     8648f39daa8f   815 MB    About an hour ago

This confirms that the gemma3:1b model is ready to be used.

Creating an Agent with a Local LLM

Now, let’s create a Python script that connects the Google ADK to our local Ollama model.

Here is the complete script, litellmagent.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import warnings
import asyncio
 
# The following warning is a known issue in the ADK library and can be safely ignored.
# It occurs because the `SequentialAgent` class in ADK re-defines a field
# that is already present in its parent `BaseAgent` class.
warnings.filterwarnings(
    "ignore",
    message='Field name "config_type" in "SequentialAgent" shadows an attribute in parent "BaseAgent"',
    category=UserWarning,
    module="pydantic._internal._fields",
)
from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
 
# Define the Ollama model to be used.
# The format is "ollama_chat/model_name"
# Ensure the model is running on your local machine.
ollama_model = LiteLlm(model="ollama_chat/gemma3:1b")
 
# Create a simple agent using the Ollama model.
# You can give your agent a name and instructions.
ollama_agent = Agent(
    name="LocalOllamaAgent",
    model=ollama_model,
    instruction="You are a helpful assistant that uses a local Ollama model to answer questions."
)
 
async def main():
    """Sets up the runner and session to interact with the agent."""
    # Define session details
    app_name = "ollama_app"
    user_id = "user1"
    session_id = "session1"
 
    # Create a session service and a runner
    session_service = InMemorySessionService()
    await session_service.create_session(
        app_name=app_name, user_id=user_id, session_id=session_id
    )
    runner = Runner(agent=ollama_agent, app_name=app_name, session_service=session_service)
 
    # Prepare the user's message
    query = "What's the capital of France?"
    content = types.Content(role="user", parts=[types.Part(text=query)])
 
    print(f"User: {query}")
 
    # Run the agent asynchronously and get the response
    final_response = "Agent did not produce a final response."
    async for event in runner.run_async(
        user_id=user_id, session_id=session_id, new_message=content
    ):
        if event.is_final_response() and event.content and event.content.parts:
            final_response = event.content.parts[0].text.strip()
            break  # Stop after getting the final response
 
    # Print the agent's response.
    print(f"Agent: {final_response}")
 
# Execute the main async function
if __name__ == "__main__":
    try:
        asyncio.run(main())
    except Exception as e:
        print(f"\nAn error occurred: {e}")
        print("Please ensure your local Ollama server is running and the 'gemma3:1b' model is available.")

When you run this script, it connects to your local Ollama server, sends the query “What’s the capital of France?”, and prints the agent’s response.

1
2
3
(.venv) C:\vscode-python-workspace\adkagent>python litellmagent.py
User: What's the capital of France?
Agent: The capital of France is Paris.

How the Script Works

This script demonstrates the flexibility of the Google ADK by showing how to swap a cloud-based LLM for a locally hosted one.

  • Imports and Warning Suppression: The script imports essential components from google.adk and google.genai to build and run the agent. It also includes a warnings.filterwarnings call to ignore a benign warning from the underlying Pydantic library, keeping the console output clean.
  • Model Configuration (LiteLlm): The core of the local integration is the LiteLlm instance. The line ollama_model = LiteLlm(model="ollama_chat/gemma3:1b") tells the ADK to connect to a local model. The ollama_chat/ prefix specifies that the script should use the Ollama chat completion API, and gemma3:1b is the specific model to be used.
  • Agent Definition: A standard ADK Agent is created. Critically, the model parameter is set to the ollama_model instance, which seamlessly plugs the local model into the agent’s reasoning engine.
  • Asynchronous main Function: This function handles the setup and execution of the agent. It defines session details, initializes a Runner to execute the agent’s logic, and prepares the user’s message in the ADK’s types.Content format.
  • Execution and Error Handling: The runner.run_async(...) method sends the message to the agent and returns a stream of events. The script iterates through these events, waiting for the final response. The included try...except block provides a helpful error message if the local Ollama server isn’t running, guiding the user to troubleshoot the issue.

This example showcases how the ADK’s design allows for easy integration with a variety of LLM providers, giving developers the freedom to choose the best model for their application, whether it’s a powerful cloud-based service or a locally hosted solution for privacy, cost, or performance reasons.