Skip to main content

voicegw.yaml reference

The voicegw.yaml file is the central configuration for VoiceGateway. It is validated at startup using a Pydantic schema with extra="forbid", which means any typo or unknown key produces a clear error message before your gateway starts. VoiceGateway searches for the config file in this order:
  1. ./voicegw.yaml (current directory)
  2. ~/.config/voicegateway/voicegw.yaml
  3. /etc/voicegateway/voicegw.yaml
You can override this with the VOICEGW_CONFIG environment variable. See Environment variables.

Top-level sections

The config file has thirteen top-level sections. All are optional.
SectionPurpose
providersAPI keys and settings for each provider
modelsRegister custom model aliases
stacksNamed bundles of STT + LLM + TTS models
projectsPer-project tracking and budgets
fallbacksOrdered fallback chains per modality
observabilityToggle latency, cost, and logging middleware
cost_trackingSQLite database settings for cost persistence
latencyTTFB warning thresholds and percentile config
rate_limitsPer-provider request rate limits
ingestRate limits for the fleet collector ingest endpoint
retentionAge-out policy for collector data
workersBackground rollup and retention cadence
serveBind host and port for the daemon

providers

Configure credentials and settings for each provider. Keys are provider names matching VoiceGateway’s built-in provider identifiers.
providers:
  deepgram:
    api_key: ${DEEPGRAM_API_KEY}
  openai:
    api_key: ${OPENAI_API_KEY}
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
  groq:
    api_key: ${GROQ_API_KEY}
  cartesia:
    api_key: ${CARTESIA_API_KEY}
  elevenlabs:
    api_key: ${ELEVENLABS_API_KEY}
  assemblyai:
    api_key: ${ASSEMBLYAI_API_KEY}
  ollama:
    base_url: http://localhost:11434
  whisper:
    enabled: true
  kokoro:
    enabled: true
  piper:
    enabled: true
Each provider supports at minimum:
  • api_key (string): API key, typically via ${ENV_VAR} substitution.
  • base_url (string): override the default API endpoint.
  • enabled (bool, default true): disable a provider without removing its config.
See Providers for per-provider details.

models

Register custom model aliases organised by modality. Each entry maps an alias to a provider and model name, with optional defaults.
models:
  stt:
    fast-transcription:
      provider: deepgram
      model: nova-3
    offline-transcription:
      provider: whisper
      model: large-v3
  llm:
    reasoning:
      provider: anthropic
      model: claude-sonnet-4-5
  tts:
    narrator:
      provider: cartesia
      model: sonic-3
      default_voice: narrator-male
See Models.

stacks

Named bundles that map to one STT, one LLM, and one TTS model. Use stacks to define preset quality / cost tiers.
stacks:
  premium:
    stt: deepgram/nova-3
    llm: anthropic/claude-sonnet-4-5
    tts: cartesia/sonic-3
  budget:
    stt: groq/whisper-large-v3
    llm: groq/llama-3.3-70b-versatile
    tts: local/piper:en_US-lessac-medium
  local:
    stt: local/whisper-large-v3
    llm: ollama/llama3.2:3b
    tts: local/kokoro
See Stacks.

projects

Define projects for cost attribution and budget enforcement. Each project can override providers per-key.
projects:
  customer-support:
    name: Customer Support Bot
    description: Production support agent
    default_stack: premium
    daily_budget: 50.00
    budget_action: throttle
    tags: [prod, support]
    providers:
      deepgram:
        api_key: ${SUPPORT_DEEPGRAM_KEY}
      anthropic:
        api_key: ${SUPPORT_ANTHROPIC_KEY}
  internal-qa:
    name: Internal QA Bot
    description: Testing and QA agent
    default_stack: budget
    daily_budget: 10.00
    budget_action: warn
    tags: [dev, qa]

default_project: customer-support
budget_action is one of warn, throttle, or block. Project- scoped providers override the top-level providers for that project; otherwise the top-level keys apply. See Projects.

fallbacks

Ordered lists of model ids per modality. Used as a resolver-time hint: walk the list at startup and pick the first model whose provider plugin imports cleanly.
fallbacks:
  stt:
    - deepgram/nova-3
    - openai/whisper-1
    - local/whisper-large-v3
  llm:
    - anthropic/claude-sonnet-4-5
    - openai/gpt-4.1-mini
    - ollama/llama3.2:3b
  tts:
    - cartesia/sonic-3
    - elevenlabs/eleven_multilingual_v2
    - local/kokoro

observability

Three boolean flags that control which middleware runs. All default to true.
observability:
  latency_tracking: true
  cost_tracking: true
  request_logging: true
See Observability.

cost_tracking

Configure the SQLite storage backend for cost persistence.
cost_tracking:
  enabled: true
  db_path: ~/.config/voicegateway/voicegw.db
  daily_budget_alert: 100.00
  • enabled (bool, default false): enable cost persistence. Also enabled automatically if VOICEGW_DB_PATH is set.
  • db_path (string): path to the SQLite database file.
  • daily_budget_alert (float, optional): global daily budget alert threshold.

latency

Configure latency monitoring thresholds.
latency:
  ttfb_warning_ms: 500.0
  percentiles: [50.0, 95.0, 99.0]
  • ttfb_warning_ms (float, default 500.0): time-to-first-byte warning threshold in milliseconds.
  • percentiles (list of floats): which percentiles to track and report.

rate_limits

Per-provider rate limiting.
rate_limits:
  deepgram:
    requests_per_minute: 100
  openai:
    requests_per_minute: 60
  • requests_per_minute (int): maximum requests per minute for the given provider.

ingest

Rate limiting for the fleet collector ingest endpoint (POST /v1/ingest), where remote agents push telemetry. Limiting is a per-caller token bucket keyed by virtual key (then static API key, then client IP).
ingest:
  enabled: true
  requests_per_minute: 120
  burst: 240
  max_batch_size: 1000
  • enabled (bool, default true): turn ingest rate limiting on or off.
  • requests_per_minute (int, default 120): sustained per-caller request rate. Set to 0 to disable limiting (unlimited).
  • burst (int, default 240): token-bucket ceiling, the largest burst a caller can send before being throttled.
  • max_batch_size (int, default 1000): maximum records in one POST. A larger batch is rejected with 413 before any database write.
Over-limit requests get 429 with a Retry-After header (integer seconds). The library’s remote sink honors Retry-After and retries without dropping the batch, so transient throttling never loses telemetry.

retention

Hard-delete aged rows from the collector database. A background worker prunes, per project, sessions and their dependent rows (replay, turns, dead-air, guardrail) by ended_at, and requests by timestamp, in batches.
retention:
  enabled: true
  default_days: 90
  • enabled (bool, default true): turn retention pruning on or off.
  • default_days (int, default 90): age after which a project’s rows are deleted. Applies to every project that has data.

workers

Cadence for the collector’s background workers: the latency and agent rollups, and the retention prune. Workers run in-process and are started by the server. In a multi-replica deployment, set enabled: false on every replica except the one chosen to run them (rollups and prunes are idempotent, but running them on every replica is wasteful).
workers:
  enabled: true
  rollup_interval_seconds: 900
  retention_interval_seconds: 3600
  • enabled (bool, default true): start the background workers. When false, no workers run (the rollup tables stay stale and retention does not prune).
  • rollup_interval_seconds (int, default 900): how often the latency and agent rollups refresh. The Agents dashboard list serves this 24h rollup.
  • retention_interval_seconds (int, default 3600): how often retention runs.

serve

Bind host and port for the daemon. The daemon serves the HTTP API (/v1/*), the dashboard API (/api/*), and the React SPA (/) all on this single port.
serve:
  host: 0.0.0.0
  port: 8080
  • host (string, default 0.0.0.0): bind address. Use 127.0.0.1 to restrict to localhost.
  • port (int, default 8080): port number. The wizard collects this as question 4 of voicegw onboard.

Environment variable substitution

Any string value in the config can use ${ENV_VAR} syntax. VoiceGateway substitutes these at load time using os.environ.
providers:
  deepgram:
    api_key: ${DEEPGRAM_API_KEY}
If the environment variable is not set, it resolves to an empty string. See Environment variables.