Python SDK Reference
VoiceGateway exposes one public Python surface: thevoicegateway.inference module, a drop-in mirror of
livekit.agents.inference. New agent code uses it; existing LiveKit
Cloud Inference code switches over with one import-line change.
Cost queries, project management, latency stats, and request logs live outside the Python SDK. Use the CLI, the HTTP API, the dashboard, or the MCP tools for those.
Installation
Import
inference submodule is the only documented public entry point. The internal voicegateway.core.gateway.Gateway class still exists for the CLI, HTTP server, and MCP runtime, but it is not part of the supported Python SDK and may change without notice.
inference.STT
model string parses as provider/model[:language]. Provider names are validated against the eleven supported types (openai, deepgram, cartesia, anthropic, groq, elevenlabs, assemblyai, ollama, whisper, kokoro, piper). The api_key kwarg, when given, overrides the project’s resolved key for this one instance (useful for testing).
api_secret, fallback, and conn_options are accepted for drop-in compatibility but emit a UserWarning.
inference.LLM
None defaults instead of NotGivenOr to match LK’s LLM shape. There is no fallback, conn_options, or http_session parameter; those are STT/TTS-specific.
inference.TTS
voice kwarg. The trailing colon-suffix in the model string parses as voice (NOT language). That is the semantic asymmetry between STT and TTS that LiveKit defines.
Project routing
inference.set_project
asyncio.Task instances.
Resolution order for the active project:
inference.set_project(name)in the current context.VOICEGW_ACTIVE_PROJECTenvironment variable.default_projectfield invoicegw.yaml.- The literal
"default". The gateway auto-creates a project of this id on first run, so the fallback is always backed by a real row.
inference.get_active_project
Session correlation
inference.start_session
session_id ("vg-<uuid4>"). Inside AgentSession this happens automatically: the first factory constructed in a context creates the id, the others inherit it. The id is written to requests.session_id and accumulates into the sessions table.
The standard livekit-agents worker spawns a fresh task per call, so the ContextVar starts clean and start_session is unnecessary. Worker patterns that handle multiple conversations sequentially in a single asyncio task need to call start_session() at the top of each conversation handler; otherwise the second conversation reuses the first’s id.
asyncio.Task instances created before the session opens get their own ids. Construct factories at session entry, not at module import time.
inference.attach_session (opt-in)
AgentSession into the voice-conversation metrics pipeline: per-turn response speed, talk-over rate, and dead-air detection.
In the standard livekit-agents worker pattern, the metric capture happens automatically through plugin-level hooks on InstrumentedSTT/InstrumentedTTS. attach_session exists for the cases where those hooks miss events: custom AgentSession subclasses, in-process agent harnesses, or test rigs. When in doubt, you don’t need to call it.
Returns the bound session_id so the caller can echo it into its own logs.
AgentSession events: user_started_speaking, user_stopped_speaking, agent_started_speaking, agent_stopped_speaking, close. The first four feed the TurnTracker; close flushes the tracker, stops the DeadAirDetector, and calls CostTracker.close_session(sid) so the aggregate columns (talk_time_seconds, per_minute_cost_usd, response_speed_p50/p95_ms, talk_over_rate) land on the sessions row by the time the dashboard’s /api/metrics endpoint reads it.
Components default to the process-level registry the Gateway populates on startup; pass explicit kwargs to override (the unit-test path).
Tenant attribution
VoiceGateway tags each session with an optionaltenant_id so multi-tenant operators can slice costs, metrics, and replay by customer. The tenant flows through three independent surfaces; pick the one that matches your deployment.
1. attach_session(..., tenant_id="…")
The opt-in path. Pass tenant_id at the same time you wire the LiveKit AgentSession, and every cost row, metric row, and replay event from that session lands tagged.
tenant_id is omitted (default None) the ContextVar is left alone, so a virtual key resolved earlier in the request (see surface 3 below) or an explicit set_tenant(...) call still wins. Calling with tenant_id=None does not clear a previously-set scope.
2. inference.set_tenant(tenant_id)
The escape hatch for code that does not own the AgentSession construction. Sets the tenant_id_ctx ContextVar for the rest of the async context; the next log_request call picks it up and stamps the session row. 128-char UTF-8 cap.
inference.current_tenant() reads the current scope without modifying it. inference.reset_tenant_id() clears the ContextVar for a new session boundary inside the same task.
3. Scoped API keys (no code change required)
When a request authenticates with an API key whosetenant_id is set, the HTTP API’s auth middleware (in src/voicegateway/server/main.py::build_app) auto-tags the session with the key’s scope. Agent code does not need to know the tenant: the dashboard’s API Keys page (/api-keys) issues a scoped key, the operator ships it as Authorization: Bearer <key>, and every request inherits the scope.
Body-level tenant_id is rejected with 403 when it conflicts with a scoped virtual key. Unscoped virtual keys (issued without a tenant) allow the body to declare any tenant, matching the static-key behavior.
The “unattributed” bucket
Sessions where none of the three surfaces set a tenant gettenant_id = NULL in storage. The dashboard renders these as a muted “unattributed” pill rather than a literal tenant string.
The first tenant-bearing request “wins” for the session’s lifetime. A later unattributed request on the same session_id does not clear the tenant_id (the sessions UPSERT uses COALESCE(tenant_id, excluded.tenant_id)).
For the operator-facing workflow (issuing keys, viewing per-tenant costs, exporting), see the multi-tenant quickstart.
Cross-modality routing
Each project carries a latency budget and a per-modality provider roster. When a session starts, VoiceGateway picks the (STT, LLM, TTS) combination from the roster that minimises predicted total latency under the budget. The pick is recorded on the session row so the dashboard can show what ran and how close the call landed to the budget.How the router picks
At session start, the router reads three inputs and produces aRoutedTriple plus a budget_overrun boolean.
- Observed p50 per (provider, modality): rolled up by the 15-minute worker from the requests table, written to
latency_observations. The router prefers observed data when present. - Curated published-median baselines in
src/voicegateway/core/provider_baselines.json. Used when no observation exists for a candidate. Operators can edit the JSON to update a published median or add a missing provider. - Caller overrides: explicit
{modality: provider}map passed from the agent code. The router respects overrides for the named modalities and only picks for the unset ones.
fallback_to_fastest=true, it picks the fastest available and flags budget_overrun=true. If fallback_to_fastest=false, it raises BudgetExceeded.
Explicit overrides from agent code
Pass the caller-override dict through whatever surface attaches the session. The reference path isroute_session(...) returning a RoutedTriple, with the caller then handing the triple to attach_session(routed_triple=...):
Inspecting what the router would pick
For ops debugging,voicegw route show <project> prints the current observations and rosters, and voicegw route simulate <project> [--stt X] [--llm Y] dry-runs the picker without writing a session row. Both accept --json for scripting.
For the agency-facing operator workflow (tuning budgets, uploading branding, exporting per-project data), see the agency quickstart.
Voice-specific guardrails
Guardrails are project-scoped and injected through the existing drop-invoicegateway.inference.LLM(...) path. No separate session-create service is required.
report_guardrail_action(category, action, context_excerpt). User-defined tools with that name are rejected when guardrails are active.
Bypass is explicit and audited:
bypassed audit row when a policy would otherwise be active. See the guardrails guide and prompt reference.
Conversation replay capture
VoiceGateway captures a per-event timeline for every voice conversation: each STT chunk, each LLM token, each TTS frame, plus periodic conversation-state snapshots. The dashboard’s Replay page then scrubs through any past call moment-by-moment with cost accruing live. This happens automatically; users do not call any function to opt in. The capture path runs alongside the metrics pipeline. The sameattach_session helper covered above wires replay events into the ReplayCapture buffer on the standard worker pattern. Custom AgentSession subclasses use the same opt-in escape hatch.
Defaults and per-project knobs
Replay capture defaults live under each project’sreplay: block in voicegw.yaml:
replay: accepts the defaults shown above. The enabled toggle disables capture for the project (cost and metrics aggregates continue as before); the other three tune the storage/memory trade-off documented in docs/storage/replay-storage-costs.md.
Disabling capture
For projects that should not record replay (sensitive content, regulatory constraint, storage cost concerns), setenabled: false:
Retention worker
TheRetentionWorker runs once an hour as a background asyncio task; it reads each project’s retention_days and deletes replay rows tied to sessions whose ended_at is older than the window. Single-process; multi-replica coordination is out of scope.
The dashboard’s POST /api/projects/{id}/replay/retention endpoint updates retention_days in memory for the current process. The change applies on the next worker tick. Persistence to voicegw.yaml on disk is a future follow-up; restarting the gateway reverts to the file-defined value.
Operations: where to go
| You want to | Use this |
|---|---|
| List projects | voicegw projects (CLI), GET /v1/projects (HTTP), list_projects (MCP) |
| See costs | voicegw costs (CLI), GET /v1/costs (HTTP), get_costs (MCP), the dashboard |
| Tail recent requests | voicegw logs (CLI), GET /v1/logs (HTTP), get_logs (MCP) |
| Add or rotate a provider key | vg_add_provider / vg_set_provider_key (MCP), the dashboard Providers page |
| Reconcile against an invoice | voicegw reconcile --provider <name> --provider-usage-file <path> |