Module `livekit.plugins.cerebras`

Cerebras plugin for LiveKit Agents

Support for LLM with Cerebras fast inference, including payload optimization via gzip compression and msgpack encoding.

Classes

class LLM (*, model: str | CerebrasChatModels = 'llama3.1-8b', api_key: NotGivenOr[str] = NOT_GIVEN, base_url: NotGivenOr[str] = 'https://api.cerebras.ai/v1', client: openai.AsyncClient | None = None, user: NotGivenOr[str] = NOT_GIVEN, temperature: NotGivenOr[float] = NOT_GIVEN, parallel_tool_calls: NotGivenOr[bool] = NOT_GIVEN, tool_choice: NotGivenOr[ToolChoice] = NOT_GIVEN, reasoning_effort: NotGivenOr[ReasoningEffort] = NOT_GIVEN, safety_identifier: NotGivenOr[str] = NOT_GIVEN, prompt_cache_key: NotGivenOr[str] = NOT_GIVEN, top_p: NotGivenOr[float] = NOT_GIVEN, timeout: httpx.Timeout | None = None, max_retries: NotGivenOr[int] = NOT_GIVEN, gzip_compression: bool = True, msgpack_encoding: bool = True)

Expand source code

class LLM(OpenAILLM):
    def __init__(
        self,
        *,
        model: str | CerebrasChatModels = "llama3.1-8b",
        api_key: NotGivenOr[str] = NOT_GIVEN,
        base_url: NotGivenOr[str] = "https://api.cerebras.ai/v1",
        client: openai.AsyncClient | None = None,
        user: NotGivenOr[str] = NOT_GIVEN,
        temperature: NotGivenOr[float] = NOT_GIVEN,
        parallel_tool_calls: NotGivenOr[bool] = NOT_GIVEN,
        tool_choice: NotGivenOr[ToolChoice] = NOT_GIVEN,
        reasoning_effort: NotGivenOr[ReasoningEffort] = NOT_GIVEN,
        safety_identifier: NotGivenOr[str] = NOT_GIVEN,
        prompt_cache_key: NotGivenOr[str] = NOT_GIVEN,
        top_p: NotGivenOr[float] = NOT_GIVEN,
        timeout: httpx.Timeout | None = None,
        max_retries: NotGivenOr[int] = NOT_GIVEN,
        gzip_compression: bool = True,
        msgpack_encoding: bool = True,
    ):
        """
        Create a new instance of Cerebras LLM.

        ``api_key`` must be set to your Cerebras API key, either using the argument or by setting
        the ``CEREBRAS_API_KEY`` environmental variable.

        When ``gzip_compression`` is True (default), request payloads are gzip-compressed,
        which can reduce TTFT for requests with large prompts.

        When ``msgpack_encoding`` is True (default), request payloads are encoded with msgpack
        binary format instead of JSON.
        """

        cerebras_api_key = _get_api_key(api_key)

        created_client = False
        if client is None and (gzip_compression or msgpack_encoding):
            client = _CerebrasClient(
                use_msgpack=msgpack_encoding,
                use_gzip=gzip_compression,
                api_key=cerebras_api_key,
                base_url=base_url if is_given(base_url) else None,
                max_retries=max_retries if is_given(max_retries) else 0,
                http_client=httpx.AsyncClient(
                    timeout=timeout
                    if timeout
                    else httpx.Timeout(connect=15.0, read=5.0, write=5.0, pool=5.0),
                    follow_redirects=True,
                    limits=httpx.Limits(
                        max_connections=50,
                        max_keepalive_connections=50,
                        keepalive_expiry=120,
                    ),
                ),
            )
            created_client = True

        super().__init__(
            model=model,
            api_key=cerebras_api_key,
            base_url=base_url,
            client=client,
            user=user,
            temperature=temperature,
            parallel_tool_calls=parallel_tool_calls,
            tool_choice=tool_choice,
            reasoning_effort=reasoning_effort,
            safety_identifier=safety_identifier,
            prompt_cache_key=prompt_cache_key,
            top_p=top_p,
            timeout=timeout,
            max_retries=max_retries,
            _strict_tool_schema=False,
        )

        if created_client:
            self._owns_client = True

Helper class that provides a standard way to create an ABC using inheritance.

Create a new instance of Cerebras LLM.

api_key must be set to your Cerebras API key, either using the argument or by setting the CEREBRAS_API_KEY environmental variable.

When gzip_compression is True (default), request payloads are gzip-compressed, which can reduce TTFT for requests with large prompts.

When msgpack_encoding is True (default), request payloads are encoded with msgpack binary format instead of JSON.

Ancestors

livekit.plugins.openai.llm.LLM
livekit.agents.llm.llm.LLM
abc.ABC
EventEmitter
typing.Generic

Inherited members

EventEmitter:
- emit
- off
- on
- once