voicegw.yaml reference

The voicegw.yaml file is the central configuration for VoiceGateway. It is validated at startup using a Pydantic schema with extra="forbid", which means any typo or unknown key produces a clear error message before your gateway starts. VoiceGateway searches for the config file in this order:

./voicegw.yaml (current directory)
~/.config/voicegateway/voicegw.yaml
/etc/voicegateway/voicegw.yaml

You can override this with the VOICEGW_CONFIG environment variable. See Environment variables.

Top-level sections

The config file has thirteen top-level sections. All are optional.

Section	Purpose
`providers`	API keys and settings for each provider
`models`	Register custom model aliases
`stacks`	Named bundles of STT + LLM + TTS models
`projects`	Per-project tracking and budgets
`fallbacks`	Ordered fallback chains per modality
`observability`	Toggle latency, cost, and logging middleware
`cost_tracking`	SQLite database settings for cost persistence
`latency`	TTFB warning thresholds and percentile config
`rate_limits`	Per-provider request rate limits
`ingest`	Rate limits for the fleet collector ingest endpoint
`retention`	Age-out policy for collector data
`workers`	Background rollup and retention cadence
`serve`	Bind host and port for the daemon

`providers`

Configure credentials and settings for each provider. Keys are provider names matching VoiceGateway’s built-in provider identifiers.

providers:
  deepgram:
    api_key: ${DEEPGRAM_API_KEY}
  openai:
    api_key: ${OPENAI_API_KEY}
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
  groq:
    api_key: ${GROQ_API_KEY}
  cartesia:
    api_key: ${CARTESIA_API_KEY}
  elevenlabs:
    api_key: ${ELEVENLABS_API_KEY}
  assemblyai:
    api_key: ${ASSEMBLYAI_API_KEY}
  ollama:
    base_url: http://localhost:11434
  whisper:
    enabled: true
  kokoro:
    enabled: true
  piper:
    enabled: true

Each provider supports at minimum:

api_key (string): API key, typically via ${ENV_VAR} substitution.
base_url (string): override the default API endpoint.
enabled (bool, default true): disable a provider without removing its config.

See Providers for per-provider details.

`models`

Register custom model aliases organised by modality. Each entry maps an alias to a provider and model name, with optional defaults.

models:
  stt:
    fast-transcription:
      provider: deepgram
      model: nova-3
    offline-transcription:
      provider: whisper
      model: large-v3
  llm:
    reasoning:
      provider: anthropic
      model: claude-sonnet-4-5
  tts:
    narrator:
      provider: cartesia
      model: sonic-3
      default_voice: narrator-male

See Models.

`stacks`

Named bundles that map to one STT, one LLM, and one TTS model. Use stacks to define preset quality / cost tiers.

stacks:
  premium:
    stt: deepgram/nova-3
    llm: anthropic/claude-sonnet-4-5
    tts: cartesia/sonic-3
  budget:
    stt: groq/whisper-large-v3
    llm: groq/llama-3.3-70b-versatile
    tts: local/piper:en_US-lessac-medium
  local:
    stt: local/whisper-large-v3
    llm: ollama/llama3.2:3b
    tts: local/kokoro

See Stacks.

`projects`

Define projects for cost attribution and budget enforcement. Each project can override providers per-key.

projects:
  customer-support:
    name: Customer Support Bot
    description: Production support agent
    default_stack: premium
    daily_budget: 50.00
    budget_action: throttle
    tags: [prod, support]
    providers:
      deepgram:
        api_key: ${SUPPORT_DEEPGRAM_KEY}
      anthropic:
        api_key: ${SUPPORT_ANTHROPIC_KEY}
  internal-qa:
    name: Internal QA Bot
    description: Testing and QA agent
    default_stack: budget
    daily_budget: 10.00
    budget_action: warn
    tags: [dev, qa]

default_project: customer-support

budget_action is one of warn, throttle, or block. Project- scoped providers override the top-level providers for that project; otherwise the top-level keys apply. See Projects.

`fallbacks`

Ordered lists of model ids per modality. Used as a resolver-time hint: walk the list at startup and pick the first model whose provider plugin imports cleanly.

fallbacks:
  stt:
    - deepgram/nova-3
    - openai/whisper-1
    - local/whisper-large-v3
  llm:
    - anthropic/claude-sonnet-4-5
    - openai/gpt-4.1-mini
    - ollama/llama3.2:3b
  tts:
    - cartesia/sonic-3
    - elevenlabs/eleven_multilingual_v2
    - local/kokoro

`observability`

Three boolean flags that control which middleware runs. All default to true.

observability:
  latency_tracking: true
  cost_tracking: true
  request_logging: true

See Observability.

`cost_tracking`

Configure the SQLite storage backend for cost persistence.

cost_tracking:
  enabled: true
  db_path: ~/.config/voicegateway/voicegw.db
  daily_budget_alert: 100.00

enabled (bool, default false): enable cost persistence. Also enabled automatically if VOICEGW_DB_PATH is set.
db_path (string): path to the SQLite database file.
daily_budget_alert (float, optional): global daily budget alert threshold.

`latency`

Configure latency monitoring thresholds.

latency:
  ttfb_warning_ms: 500.0
  percentiles: [50.0, 95.0, 99.0]

ttfb_warning_ms (float, default 500.0): time-to-first-byte warning threshold in milliseconds.
percentiles (list of floats): which percentiles to track and report.

`rate_limits`

Per-provider rate limiting.

rate_limits:
  deepgram:
    requests_per_minute: 100
  openai:
    requests_per_minute: 60

requests_per_minute (int): maximum requests per minute for the given provider.

`ingest`

Rate limiting for the fleet collector ingest endpoint (POST /v1/ingest), where remote agents push telemetry. Limiting is a per-caller token bucket keyed by virtual key (then static API key, then client IP).

ingest:
  enabled: true
  requests_per_minute: 120
  burst: 240
  max_batch_size: 1000

enabled (bool, default true): turn ingest rate limiting on or off.
requests_per_minute (int, default 120): sustained per-caller request rate. Set to 0 to disable limiting (unlimited).
burst (int, default 240): token-bucket ceiling, the largest burst a caller can send before being throttled.
max_batch_size (int, default 1000): maximum records in one POST. A larger batch is rejected with 413 before any database write.

Over-limit requests get 429 with a Retry-After header (integer seconds). The library’s remote sink honors Retry-After and retries without dropping the batch, so transient throttling never loses telemetry.

`retention`

Hard-delete aged rows from the collector database. A background worker prunes, per project, sessions and their dependent rows (replay, turns, dead-air, guardrail) by ended_at, and requests by timestamp, in batches.

retention:
  enabled: true
  default_days: 90

enabled (bool, default true): turn retention pruning on or off.
default_days (int, default 90): age after which a project’s rows are deleted. Applies to every project that has data.

`workers`

Cadence for the collector’s background workers: the latency and agent rollups, and the retention prune. Workers run in-process and are started by the server. In a multi-replica deployment, set enabled: false on every replica except the one chosen to run them (rollups and prunes are idempotent, but running them on every replica is wasteful).

workers:
  enabled: true
  rollup_interval_seconds: 900
  retention_interval_seconds: 3600

enabled (bool, default true): start the background workers. When false, no workers run (the rollup tables stay stale and retention does not prune).
rollup_interval_seconds (int, default 900): how often the latency and agent rollups refresh. The Agents dashboard list serves this 24h rollup.
retention_interval_seconds (int, default 3600): how often retention runs.

`serve`

Bind host and port for the daemon. The daemon serves the HTTP API (/v1/*), the dashboard API (/api/*), and the React SPA (/) all on this single port.

serve:
  host: 0.0.0.0
  port: 8080

host (string, default 0.0.0.0): bind address. Use 127.0.0.1 to restrict to localhost.
port (int, default 8080): port number. The wizard collects this as question 4 of voicegw onboard.

Environment variable substitution

Any string value in the config can use ${ENV_VAR} syntax. VoiceGateway substitutes these at load time using os.environ.

providers:
  deepgram:
    api_key: ${DEEPGRAM_API_KEY}

If the environment variable is not set, it resolves to an empty string. See Environment variables.

​voicegw.yaml reference

​Top-level sections

​providers

​models

​stacks

​projects

​fallbacks

​observability

​cost_tracking

​latency

​rate_limits

​ingest

​retention

​workers

​serve

​Environment variable substitution