LiveKit docs › Partner spotlight › Google › Gemini LLM

---

# Google Gemini LLM

> How to use Google Gemini models with LiveKit Agents.

- **[Use in Agent Builder](https://cloud.livekit.io/projects/p_/agents/builder/new?llm=google%2Fgemini-3-flash-preview)**: Create a new agent in your browser using google/gemini-3-flash-preview

## Overview

Google Gemini models are available in LiveKit Agents through [LiveKit Inference](https://docs.livekit.io/agents/models/inference.md) and the [Gemini plugin](#plugin). With LiveKit Inference, your agent runs on LiveKit's infrastructure to minimize latency. No separate provider API key is required, and usage and rate limits are managed through LiveKit Cloud. Use the plugin instead if you want to manage your own billing and rate limits. Pricing for LiveKit Inference is available on the [pricing page](https://livekit.com/pricing/inference#llm).

## LiveKit Inference

Use [LiveKit Inference](https://docs.livekit.io/agents/models/inference.md) to access Gemini models without a separate Google API key.

| Model name | Model ID | Providers |
| ---------- | -------- | -------- |
| Gemini 2.0 Flash | `google/gemini-2.0-flash` | `google` |
| Gemini 2.0 Flash-Lite | `google/gemini-2.0-flash-lite` | `google` |
| Gemini 2.5 Flash | `google/gemini-2.5-flash` | `google` |
| Gemini 2.5 Flash-Lite | `google/gemini-2.5-flash-lite` | `google` |
| Gemini 2.5 Pro | `google/gemini-2.5-pro` | `google` |
| Gemini 3 Flash | `google/gemini-3-flash-preview` | `google` |
| Gemini 3 Pro | `google/gemini-3-pro-preview` | `google` |
| Gemini 3.1 Flash Lite | `google/gemini-3.1-flash-lite` | `google` |
| Gemini 3.5 Flash | `google/gemini-3.5-flash` | `google` |
| Gemini 3.1 Pro | `google/gemini-3.1-pro-preview` | `google` |

### Usage

To use Gemini, use the `LLM` class from the `inference` module. You can use this LLM in the [Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai.md):

**Python**:

```python
from livekit.agents import AgentSession, inference

session = AgentSession(
    llm=inference.LLM(
        model="google/gemini-2.5-flash-lite",
        extra_kwargs={
            "max_completion_tokens": 1000
        }
    ),
    # ... tts, stt, vad, turn_handling, etc.
)

```

---

**Node.js**:

```typescript
import { AgentSession, inference } from '@livekit/agents';

const session = new AgentSession({
    llm: new inference.LLM({ 
        model: "google/gemini-2.5-flash-lite", 
        modelOptions: { 
            max_completion_tokens: 1000 
        }
    }),
    // ... tts, stt, vad, turnHandling, etc.
});

```

### Parameters

The following are parameters for configuring Gemini models with LiveKit Inference. For model behavior parameters like `temperature` and `max_completion_tokens`, see [model parameters](#model-parameters).

- **`model`** _(string)_: The model ID from the [models list](#inference).

- **`provider`** _(string)_ (optional): Set a specific provider to use for the LLM. Refer to the [models list](#inference) for available providers. If not set, LiveKit Inference uses the best available provider, and bills accordingly.

- **`extra_kwargs`** _(dict)_ (optional): Additional parameters to pass to the Gemini Chat Completions API, such as `max_tokens` or `temperature`. See [model parameters](#model-parameters) for supported fields.

In Node.js this parameter is called `modelOptions`.

#### Model parameters

Pass the following parameters inside `extra_kwargs` (Python) or `modelOptions` (Node.js). For more details about each parameter in the list, see [Inference parameters](https://docs.livekit.io/reference/agents/inference-llm-parameters.md).

| Parameter | Type | Default | Notes |
| temperature | `float` | `1` | Controls the randomness of the model's output. Valid range: `0`-`2`. |
| top_p | `float` | `1` | Alternative to `temperature`. Valid range: `0`-`1`. |
| frequency_penalty | `float` | `0` | Reduces the model's likelihood to repeat tokens that have already appeared. Valid range: `-2.0`-`2.0`. |
| presence_penalty | `float` | `0` | Increases the model's likelihood to introduce new topics. Valid range: `-2.0`-`2.0`. |
| max_completion_tokens | `int` |  | Maximum number of tokens to generate. Also accepted as `max_tokens`. |
| seed | `int` |  | Enables deterministic sampling. The system makes a best effort to return the same result for identical requests. |
| stop | `str | list[str]` |  | Sequences that stop generation. |
| tool_choice | `ToolChoice | Literal['auto', 'required', 'none']` | `"auto"` | Controls how the model uses tools. |
| reasoning_effort | `"low" | "medium" | "high"` |  | Controls thinking depth. Maps to thinking token budgets of `1024`, `8192`, and `24576` respectively. Only supported by thinking-capable models. |

### String descriptors

As a shortcut, you can also pass a [model ID](#inference) directly to the `llm` argument in your `AgentSession`:

**Python**:

```python
from livekit.agents import AgentSession

session = AgentSession(
    llm="google/gemini-2.5-flash-lite",
    # ... tts, stt, vad, turn_handling, etc.
)

```

---

**Node.js**:

```typescript
import { AgentSession } from '@livekit/agents';

const session = new AgentSession({
    llm: "google/gemini-2.5-flash-lite",
    // ... tts, stt, vad, turnHandling, etc.
});

```

## Plugin

LiveKit's plugin support for Google lets you connect directly to Google's Gemini API with your own API key.

Available in:
- [x] Node.js
- [x] Python

### Installation

Install the plugin from PyPI:

**Python**:

```shell
uv add "livekit-agents[google]~=1.5"

```

---

**Node.js**:

```shell
pnpm add @livekit/agents-plugin-google@1.x

```

### Authentication

The Google plugin requires authentication based on your chosen service:

- For Vertex AI, you must set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of the service account key file. For more information about mounting files as secrets when deploying to LiveKit Cloud, see [File-mounted secrets](https://docs.livekit.io/deploy/agents/secrets.md#file-mounted-secrets).
- For Google Gemini API, set the `GOOGLE_API_KEY` environment variable.

### Usage

Use Gemini within an `AgentSession` or as a standalone LLM service. For example, you can use this LLM in the [Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai.md).

**Python**:

```python
from livekit.plugins import google

session = AgentSession(
    llm=google.LLM(
        model="gemini-3-flash-preview",
    ),
    # ... tts, stt, vad, turn_handling, etc.
)

```

---

**Node.js**:

```typescript
import { voice } from '@livekit/agents';
import * as google from '@livekit/agents-plugin-google';

const session = new voice.AgentSession({
    llm: new google.LLM({
        model: "gemini-3-flash-preview",
    }),
    // ... tts, stt, vad, turnHandling, etc.
});

```

### Parameters

This section describes some of the available parameters. For a complete reference of all available parameters, see the [plugin reference](https://docs.livekit.io/reference/python/livekit/plugins/google/index.html.md#livekit.plugins.google.LLM).

- **`model`** _(ChatModels | str)_ (optional) - Default: `gemini-3-flash-preview`: ID of the model to use. For a full list, see [Gemini models](https://ai.google.dev/gemini-api/docs/models/gemini).

- **`api_key`** _(str)_ (optional) - Environment: `GOOGLE_API_KEY`: API key for Google Gemini API. In Node.js this parameter is called 'apiKey'.

- **`vertexai`** _(bool)_ (optional) - Default: `false`: True to use [Vertex AI](https://cloud.google.com/vertex-ai); false to use [Google AI](https://cloud.google.com/ai-platform/docs).

- **`project`** _(str)_ (optional) - Environment: `GOOGLE_CLOUD_PROJECT`: Google Cloud project to use (only if using Vertex AI). Required if using Vertex AI and the environment variable isn't set.

- **`location`** _(str)_ (optional) - Default: `` - Environment: `GOOGLE_CLOUD_LOCATION`: Google Cloud location to use (only if using Vertex AI). Required if using Vertex AI and the environment variable isn't set.

- **`temperature`** _(float)_ (optional): Sampling temperature that controls the randomness of the model's output. Higher values make the output more random, while lower values make it more focused and deterministic. Range of valid values can vary by model.

Valid values are between `0` and `2`. To learn more, see [`generationConfig.temperature`](https://ai.google.dev/api/generate-content#generationconfig) in the Gemini API reference.

- **`thinking_config`** _(ThinkingConfig)_ (optional): Configuration for the model's thinking mode, if supported.

Gemini 2.5 models accept a `thinkingBudget` integer (token budget for reasoning). Set to `0` to disable thinking, `-1` for dynamic thinking, or a value up to the model's maximum (24576 for Flash). Thinking can't be disabled on Gemini 2.5 Pro.

Gemini 3 models use `thinkingLevel` instead, with values `"minimal"`, `"low"`, `"medium"`, or `"high"`. For more information, see [Thinking](https://ai.google.dev/gemini-api/docs/thinking) in the Gemini API documentation. For the full type definition, see [`ThinkingConfig`](https://cloud.google.com/vertex-ai/generative-ai/docs/reference/rest/v1/GenerationConfig#ThinkingConfig) in the Vertex AI API reference.

For a usage example, see [Configure thinking](#thinking). In Node.js this parameter is called `thinkingConfig`.

- **`service_tier`** _(ServiceTier | "unspecified" | "flex" | "standard" | "priority")_ (optional): Service tier to use for the request. Routes the call to a specific Google API capacity and billing tier.

In Node.js this parameter is called `serviceTier`.

- **`top_p`** _(float)_ (optional): Nucleus sampling probability. The model considers tokens whose cumulative probability mass is below this value. Use as an alternative to `temperature` for controlling output randomness. Valid values are between `0` and `1`. The default varies by model.

In Node.js this parameter is called `topP`.

- **`media_resolution`** _(MediaResolution | Literal['MEDIA_RESOLUTION_UNSPECIFIED', 'MEDIA_RESOLUTION_LOW', 'MEDIA_RESOLUTION_MEDIUM', 'MEDIA_RESOLUTION_HIGH'])_ (optional): Token budget for image inputs. Lower values cut token cost in exchange for less visual detail. Especially relevant on Gemini 3 models, which default to a much larger per-image budget than 2.5. For exact token counts per model, see [Media resolution](https://ai.google.dev/gemini-api/docs/media-resolution).

In Node.js this parameter is called `mediaResolution`.

### Configure thinking

Gemini 3 models use `thinkingLevel` to control reasoning depth. The following example uses Gemini 3 Flash with `thinkingLevel` set to `"medium"`. You can pass `thinking_config` as a plain dict in Python or a plain object in Node.js — no extra import required.

**Python**:

```python
from livekit.plugins import google

session = AgentSession(
    llm=google.LLM(
        model="gemini-3-flash-preview",
        thinking_config={
            "thinking_level": "medium",
        },
    ),
    # ... tts, stt, vad, turn_handling, etc.
)

```

---

**Node.js**:

```typescript
import { voice } from '@livekit/agents';
import * as google from '@livekit/agents-plugin-google';

const session = new voice.AgentSession({
    llm: new google.LLM({
        model: "gemini-3-flash-preview",
        thinkingConfig: {
            thinkingLevel: "medium",
        },
    }),
    // ... tts, stt, vad, turnHandling, etc.
});

```

For Gemini 2.5 models, use `thinking_budget` (Python) or `thinkingBudget` (Node.js) with an integer token budget instead.

### Provider tools

Available in:
- [ ] Node.js
- [x] Python

> ℹ️ **Info**
> 
> The experimental `_gemini_tools` parameter used with Google LLMs has been removed in favor of these provider tools.

Google Gemini supports the following [provider tools](https://docs.livekit.io/agents/logic/tools.md#provider-tools) that enable the model to use built-in capabilities executed on the model server. These tools can be used alongside function tools defined in your agent's codebase.

| Tool | Description | Parameters |
| `GoogleSearch` | Search Google for up-to-date information. | `exclude_domains`, `blocking_confidence`, `time_range_filter` |
| `GoogleMaps` | Search for places and businesses using Google Maps. | `auth_config`, `enable_widget` |
| `URLContext` | Provide context from URLs. | None |
| `FileSearch` | Search file stores. | `file_search_store_names` (required), `top_k`, `metadata_filter` |
| `ToolCodeExecution` | Execute code snippets. | None |

> 🔥 **Current limitations**
> 
> Currently only the Gemini Live API supports using provider tools along with function tools.
> 
> When using text models, only provider tools _or_ function tools can be used. See [issue #53](https://github.com/google/adk-python/issues/53) for more details.

```python
from livekit.plugins import google

agent = MyAgent(
    llm=google.LLM(model="gemini-2.5-flash"),
    tools=[google.tools.GoogleSearch()],  # replace with any supported provider tool
)

```

## Additional resources

The following resources provide more information about using Google Gemini with LiveKit Agents.

- **[Gemini docs](https://ai.google.dev/gemini-api/docs/models/gemini)**: Google Gemini documentation.

- **[Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai.md)**: Get started with LiveKit Agents and Google Gemini.

- **[Google AI ecosystem guide](https://docs.livekit.io/agents/integrations/google.md)**: Overview of the entire Google AI and LiveKit Agents integration.

---

This document was rendered at 2026-06-07T11:35:11.506Z.
For the latest version of this document, see [https://docs.livekit.io/agents/models/llm/gemini.md](https://docs.livekit.io/agents/models/llm/gemini.md).

To explore all LiveKit documentation, see [llms.txt](https://docs.livekit.io/llms.txt).