Module livekit.plugins.cerebras
Cerebras plugin for LiveKit Agents
Support for LLM with Cerebras fast inference, including payload optimization via gzip compression and msgpack encoding.
Classes
class LLM (*,
model: str | CerebrasChatModels = 'llama3.1-8b',
api_key: NotGivenOr[str] = NOT_GIVEN,
base_url: NotGivenOr[str] = 'https://api.cerebras.ai/v1',
client: openai.AsyncClient | None = None,
user: NotGivenOr[str] = NOT_GIVEN,
temperature: NotGivenOr[float] = NOT_GIVEN,
parallel_tool_calls: NotGivenOr[bool] = NOT_GIVEN,
tool_choice: NotGivenOr[ToolChoice] = NOT_GIVEN,
reasoning_effort: NotGivenOr[ReasoningEffort] = NOT_GIVEN,
safety_identifier: NotGivenOr[str] = NOT_GIVEN,
prompt_cache_key: NotGivenOr[str] = NOT_GIVEN,
top_p: NotGivenOr[float] = NOT_GIVEN,
timeout: httpx.Timeout | None = None,
max_retries: NotGivenOr[int] = NOT_GIVEN,
gzip_compression: bool = True,
msgpack_encoding: bool = True)-
Expand source code
class LLM(OpenAILLM): def __init__( self, *, model: str | CerebrasChatModels = "llama3.1-8b", api_key: NotGivenOr[str] = NOT_GIVEN, base_url: NotGivenOr[str] = "https://api.cerebras.ai/v1", client: openai.AsyncClient | None = None, user: NotGivenOr[str] = NOT_GIVEN, temperature: NotGivenOr[float] = NOT_GIVEN, parallel_tool_calls: NotGivenOr[bool] = NOT_GIVEN, tool_choice: NotGivenOr[ToolChoice] = NOT_GIVEN, reasoning_effort: NotGivenOr[ReasoningEffort] = NOT_GIVEN, safety_identifier: NotGivenOr[str] = NOT_GIVEN, prompt_cache_key: NotGivenOr[str] = NOT_GIVEN, top_p: NotGivenOr[float] = NOT_GIVEN, timeout: httpx.Timeout | None = None, max_retries: NotGivenOr[int] = NOT_GIVEN, gzip_compression: bool = True, msgpack_encoding: bool = True, ): """ Create a new instance of Cerebras LLM. ``api_key`` must be set to your Cerebras API key, either using the argument or by setting the ``CEREBRAS_API_KEY`` environmental variable. When ``gzip_compression`` is True (default), request payloads are gzip-compressed, which can reduce TTFT for requests with large prompts. When ``msgpack_encoding`` is True (default), request payloads are encoded with msgpack binary format instead of JSON. """ cerebras_api_key = _get_api_key(api_key) created_client = False if client is None and (gzip_compression or msgpack_encoding): client = _CerebrasClient( use_msgpack=msgpack_encoding, use_gzip=gzip_compression, api_key=cerebras_api_key, base_url=base_url if is_given(base_url) else None, max_retries=max_retries if is_given(max_retries) else 0, http_client=httpx.AsyncClient( timeout=timeout if timeout else httpx.Timeout(connect=15.0, read=5.0, write=5.0, pool=5.0), follow_redirects=True, limits=httpx.Limits( max_connections=50, max_keepalive_connections=50, keepalive_expiry=120, ), ), ) created_client = True super().__init__( model=model, api_key=cerebras_api_key, base_url=base_url, client=client, user=user, temperature=temperature, parallel_tool_calls=parallel_tool_calls, tool_choice=tool_choice, reasoning_effort=reasoning_effort, safety_identifier=safety_identifier, prompt_cache_key=prompt_cache_key, top_p=top_p, timeout=timeout, max_retries=max_retries, _strict_tool_schema=False, ) if created_client: self._owns_client = TrueHelper class that provides a standard way to create an ABC using inheritance.
Create a new instance of Cerebras LLM.
api_keymust be set to your Cerebras API key, either using the argument or by setting theCEREBRAS_API_KEYenvironmental variable.When
gzip_compressionis True (default), request payloads are gzip-compressed, which can reduce TTFT for requests with large prompts.When
msgpack_encodingis True (default), request payloads are encoded with msgpack binary format instead of JSON.Ancestors
- livekit.plugins.openai.llm.LLM
- livekit.agents.llm.llm.LLM
- abc.ABC
- EventEmitter
- typing.Generic
Inherited members