voicegw.yaml reference
Thevoicegw.yaml file is the central configuration for
VoiceGateway. It is validated at startup using a Pydantic schema
with extra="forbid", which means any typo or unknown key produces
a clear error message before your gateway starts.
VoiceGateway searches for the config file in this order:
./voicegw.yaml(current directory)~/.config/voicegateway/voicegw.yaml/etc/voicegateway/voicegw.yaml
VOICEGW_CONFIG environment
variable. See Environment variables.
Top-level sections
The config file has thirteen top-level sections. All are optional.| Section | Purpose |
|---|---|
providers | API keys and settings for each provider |
models | Register custom model aliases |
stacks | Named bundles of STT + LLM + TTS models |
projects | Per-project tracking and budgets |
fallbacks | Ordered fallback chains per modality |
observability | Toggle latency, cost, and logging middleware |
cost_tracking | SQLite database settings for cost persistence |
latency | TTFB warning thresholds and percentile config |
rate_limits | Per-provider request rate limits |
ingest | Rate limits for the fleet collector ingest endpoint |
retention | Age-out policy for collector data |
workers | Background rollup and retention cadence |
serve | Bind host and port for the daemon |
providers
Configure credentials and settings for each provider. Keys are
provider names matching VoiceGateway’s built-in provider
identifiers.
api_key(string): API key, typically via${ENV_VAR}substitution.base_url(string): override the default API endpoint.enabled(bool, defaulttrue): disable a provider without removing its config.
models
Register custom model aliases organised by modality. Each entry
maps an alias to a provider and model name, with optional
defaults.
stacks
Named bundles that map to one STT, one LLM, and one TTS model. Use
stacks to define preset quality / cost tiers.
projects
Define projects for cost attribution and budget enforcement. Each
project can override providers per-key.
budget_action is one of warn, throttle, or block. Project-
scoped providers override the top-level providers for that
project; otherwise the top-level keys apply.
See Projects.
fallbacks
Ordered lists of model ids per modality. Used as a resolver-time
hint: walk the list at startup and pick the first model whose
provider plugin imports cleanly.
observability
Three boolean flags that control which middleware runs. All default
to true.
cost_tracking
Configure the SQLite storage backend for cost persistence.
enabled(bool, defaultfalse): enable cost persistence. Also enabled automatically ifVOICEGW_DB_PATHis set.db_path(string): path to the SQLite database file.daily_budget_alert(float, optional): global daily budget alert threshold.
latency
Configure latency monitoring thresholds.
ttfb_warning_ms(float, default500.0): time-to-first-byte warning threshold in milliseconds.percentiles(list of floats): which percentiles to track and report.
rate_limits
Per-provider rate limiting.
requests_per_minute(int): maximum requests per minute for the given provider.
ingest
Rate limiting for the fleet collector ingest endpoint (POST /v1/ingest),
where remote agents push telemetry. Limiting is a per-caller token bucket
keyed by virtual key (then static API key, then client IP).
enabled(bool, defaulttrue): turn ingest rate limiting on or off.requests_per_minute(int, default120): sustained per-caller request rate. Set to0to disable limiting (unlimited).burst(int, default240): token-bucket ceiling, the largest burst a caller can send before being throttled.max_batch_size(int, default1000): maximum records in one POST. A larger batch is rejected with413before any database write.
429 with a Retry-After header (integer seconds).
The library’s remote sink honors Retry-After and retries without dropping the
batch, so transient throttling never loses telemetry.
retention
Hard-delete aged rows from the collector database. A background worker prunes,
per project, sessions and their dependent rows (replay, turns, dead-air,
guardrail) by ended_at, and requests by timestamp, in batches.
enabled(bool, defaulttrue): turn retention pruning on or off.default_days(int, default90): age after which a project’s rows are deleted. Applies to every project that has data.
workers
Cadence for the collector’s background workers: the latency and agent rollups,
and the retention prune. Workers run in-process and are started by the server.
In a multi-replica deployment, set enabled: false on every replica except the
one chosen to run them (rollups and prunes are idempotent, but running them on
every replica is wasteful).
enabled(bool, defaulttrue): start the background workers. Whenfalse, no workers run (the rollup tables stay stale and retention does not prune).rollup_interval_seconds(int, default900): how often the latency and agent rollups refresh. The Agents dashboard list serves this 24h rollup.retention_interval_seconds(int, default3600): how often retention runs.
serve
Bind host and port for the daemon. The daemon serves the HTTP API
(/v1/*), the dashboard API (/api/*), and the React SPA (/)
all on this single port.
host(string, default0.0.0.0): bind address. Use127.0.0.1to restrict to localhost.port(int, default8080): port number. The wizard collects this as question 4 ofvoicegw onboard.
Environment variable substitution
Any string value in the config can use${ENV_VAR} syntax.
VoiceGateway substitutes these at load time using os.environ.