Module livekit.plugins.cerebras

Cerebras plugin for LiveKit Agents

Support for LLM with Cerebras fast inference, including payload optimization via gzip compression and msgpack encoding.

Classes

class LLM (*,
model: str | CerebrasChatModels = 'llama3.1-8b',
api_key: NotGivenOr[str] = NOT_GIVEN,
base_url: NotGivenOr[str] = 'https://api.cerebras.ai/v1',
client: openai.AsyncClient | None = None,
user: NotGivenOr[str] = NOT_GIVEN,
temperature: NotGivenOr[float] = NOT_GIVEN,
parallel_tool_calls: NotGivenOr[bool] = NOT_GIVEN,
tool_choice: NotGivenOr[ToolChoice] = NOT_GIVEN,
reasoning_effort: NotGivenOr[ReasoningEffort] = NOT_GIVEN,
safety_identifier: NotGivenOr[str] = NOT_GIVEN,
prompt_cache_key: NotGivenOr[str] = NOT_GIVEN,
top_p: NotGivenOr[float] = NOT_GIVEN,
timeout: httpx.Timeout | None = None,
max_retries: NotGivenOr[int] = NOT_GIVEN,
gzip_compression: bool = True,
msgpack_encoding: bool = True)
Expand source code
class LLM(OpenAILLM):
    def __init__(
        self,
        *,
        model: str | CerebrasChatModels = "llama3.1-8b",
        api_key: NotGivenOr[str] = NOT_GIVEN,
        base_url: NotGivenOr[str] = "https://api.cerebras.ai/v1",
        client: openai.AsyncClient | None = None,
        user: NotGivenOr[str] = NOT_GIVEN,
        temperature: NotGivenOr[float] = NOT_GIVEN,
        parallel_tool_calls: NotGivenOr[bool] = NOT_GIVEN,
        tool_choice: NotGivenOr[ToolChoice] = NOT_GIVEN,
        reasoning_effort: NotGivenOr[ReasoningEffort] = NOT_GIVEN,
        safety_identifier: NotGivenOr[str] = NOT_GIVEN,
        prompt_cache_key: NotGivenOr[str] = NOT_GIVEN,
        top_p: NotGivenOr[float] = NOT_GIVEN,
        timeout: httpx.Timeout | None = None,
        max_retries: NotGivenOr[int] = NOT_GIVEN,
        gzip_compression: bool = True,
        msgpack_encoding: bool = True,
    ):
        """
        Create a new instance of Cerebras LLM.

        ``api_key`` must be set to your Cerebras API key, either using the argument or by setting
        the ``CEREBRAS_API_KEY`` environmental variable.

        When ``gzip_compression`` is True (default), request payloads are gzip-compressed,
        which can reduce TTFT for requests with large prompts.

        When ``msgpack_encoding`` is True (default), request payloads are encoded with msgpack
        binary format instead of JSON.
        """

        cerebras_api_key = _get_api_key(api_key)

        created_client = False
        if client is None and (gzip_compression or msgpack_encoding):
            client = _CerebrasClient(
                use_msgpack=msgpack_encoding,
                use_gzip=gzip_compression,
                api_key=cerebras_api_key,
                base_url=base_url if is_given(base_url) else None,
                max_retries=max_retries if is_given(max_retries) else 0,
                http_client=httpx.AsyncClient(
                    timeout=timeout
                    if timeout
                    else httpx.Timeout(connect=15.0, read=5.0, write=5.0, pool=5.0),
                    follow_redirects=True,
                    limits=httpx.Limits(
                        max_connections=50,
                        max_keepalive_connections=50,
                        keepalive_expiry=120,
                    ),
                ),
            )
            created_client = True

        super().__init__(
            model=model,
            api_key=cerebras_api_key,
            base_url=base_url,
            client=client,
            user=user,
            temperature=temperature,
            parallel_tool_calls=parallel_tool_calls,
            tool_choice=tool_choice,
            reasoning_effort=reasoning_effort,
            safety_identifier=safety_identifier,
            prompt_cache_key=prompt_cache_key,
            top_p=top_p,
            timeout=timeout,
            max_retries=max_retries,
            _strict_tool_schema=False,
        )

        if created_client:
            self._owns_client = True

Helper class that provides a standard way to create an ABC using inheritance.

Create a new instance of Cerebras LLM.

api_key must be set to your Cerebras API key, either using the argument or by setting the CEREBRAS_API_KEY environmental variable.

When gzip_compression is True (default), request payloads are gzip-compressed, which can reduce TTFT for requests with large prompts.

When msgpack_encoding is True (default), request payloads are encoded with msgpack binary format instead of JSON.

Ancestors

  • livekit.plugins.openai.llm.LLM
  • livekit.agents.llm.llm.LLM
  • abc.ABC
  • EventEmitter
  • typing.Generic

Inherited members