> ## Documentation Index
> Fetch the complete documentation index at: https://docs.voicegateway.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# voicegw.yaml reference

> Every top-level section and key in the VoiceGateway config file. Validated with pydantic extra=forbid so typos fail fast at startup.

# voicegw\.yaml reference

The `voicegw.yaml` file is the central configuration for
VoiceGateway. It is validated at startup using a Pydantic schema
with `extra="forbid"`, which means any typo or unknown key produces
a clear error message before your gateway starts.

VoiceGateway searches for the config file in this order:

1. `./voicegw.yaml` (current directory)
2. `~/.config/voicegateway/voicegw.yaml`
3. `/etc/voicegateway/voicegw.yaml`

You can override this with the `VOICEGW_CONFIG` environment
variable. See [Environment variables](/configuration/environment-variables).

## Top-level sections

The config file has thirteen top-level sections. All are optional.

| Section         | Purpose                                             |
| --------------- | --------------------------------------------------- |
| `providers`     | API keys and settings for each provider             |
| `models`        | Register custom model aliases                       |
| `stacks`        | Named bundles of STT + LLM + TTS models             |
| `projects`      | Per-project tracking and budgets                    |
| `fallbacks`     | Ordered fallback chains per modality                |
| `observability` | Toggle latency, cost, and logging middleware        |
| `cost_tracking` | SQLite database settings for cost persistence       |
| `latency`       | TTFB warning thresholds and percentile config       |
| `rate_limits`   | Per-provider request rate limits                    |
| `ingest`        | Rate limits for the fleet collector ingest endpoint |
| `retention`     | Age-out policy for collector data                   |
| `workers`       | Background rollup and retention cadence             |
| `serve`         | Bind host and port for the daemon                   |

***

## `providers`

Configure credentials and settings for each provider. Keys are
provider names matching VoiceGateway's built-in provider
identifiers.

```yaml theme={null}
providers:
  deepgram:
    api_key: ${DEEPGRAM_API_KEY}
  openai:
    api_key: ${OPENAI_API_KEY}
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
  groq:
    api_key: ${GROQ_API_KEY}
  cartesia:
    api_key: ${CARTESIA_API_KEY}
  elevenlabs:
    api_key: ${ELEVENLABS_API_KEY}
  assemblyai:
    api_key: ${ASSEMBLYAI_API_KEY}
  ollama:
    base_url: http://localhost:11434
  whisper:
    enabled: true
  kokoro:
    enabled: true
  piper:
    enabled: true
```

Each provider supports at minimum:

* `api_key` (string): API key, typically via `${ENV_VAR}` substitution.
* `base_url` (string): override the default API endpoint.
* `enabled` (bool, default `true`): disable a provider without
  removing its config.

See [Providers](/configuration/providers) for per-provider
details.

***

## `models`

Register custom model aliases organised by modality. Each entry
maps an alias to a `provider` and `model` name, with optional
defaults.

```yaml theme={null}
models:
  stt:
    fast-transcription:
      provider: deepgram
      model: nova-3
    offline-transcription:
      provider: whisper
      model: large-v3
  llm:
    reasoning:
      provider: anthropic
      model: claude-sonnet-4-5
  tts:
    narrator:
      provider: cartesia
      model: sonic-3
      default_voice: narrator-male
```

See [Models](/configuration/models).

***

## `stacks`

Named bundles that map to one STT, one LLM, and one TTS model. Use
stacks to define preset quality / cost tiers.

```yaml theme={null}
stacks:
  premium:
    stt: deepgram/nova-3
    llm: anthropic/claude-sonnet-4-5
    tts: cartesia/sonic-3
  budget:
    stt: groq/whisper-large-v3
    llm: groq/llama-3.3-70b-versatile
    tts: local/piper:en_US-lessac-medium
  local:
    stt: local/whisper-large-v3
    llm: ollama/llama3.2:3b
    tts: local/kokoro
```

See [Stacks](/configuration/stacks).

***

## `projects`

Define projects for cost attribution and budget enforcement. Each
project can override providers per-key.

```yaml theme={null}
projects:
  customer-support:
    name: Customer Support Bot
    description: Production support agent
    default_stack: premium
    daily_budget: 50.00
    budget_action: throttle
    tags: [prod, support]
    providers:
      deepgram:
        api_key: ${SUPPORT_DEEPGRAM_KEY}
      anthropic:
        api_key: ${SUPPORT_ANTHROPIC_KEY}
  internal-qa:
    name: Internal QA Bot
    description: Testing and QA agent
    default_stack: budget
    daily_budget: 10.00
    budget_action: warn
    tags: [dev, qa]

default_project: customer-support
```

`budget_action` is one of `warn`, `throttle`, or `block`. Project-
scoped `providers` override the top-level `providers` for that
project; otherwise the top-level keys apply.

See [Projects](/configuration/projects).

***

## `fallbacks`

Ordered lists of model ids per modality. Used as a resolver-time
hint: walk the list at startup and pick the first model whose
provider plugin imports cleanly.

```yaml theme={null}
fallbacks:
  stt:
    - deepgram/nova-3
    - openai/whisper-1
    - local/whisper-large-v3
  llm:
    - anthropic/claude-sonnet-4-5
    - openai/gpt-4.1-mini
    - ollama/llama3.2:3b
  tts:
    - cartesia/sonic-3
    - elevenlabs/eleven_multilingual_v2
    - local/kokoro
```

***

## `observability`

Three boolean flags that control which middleware runs. All default
to `true`.

```yaml theme={null}
observability:
  latency_tracking: true
  cost_tracking: true
  request_logging: true
```

See [Observability](/configuration/observability).

***

## `cost_tracking`

Configure the SQLite storage backend for cost persistence.

```yaml theme={null}
cost_tracking:
  enabled: true
  db_path: ~/.config/voicegateway/voicegw.db
  daily_budget_alert: 100.00
```

* `enabled` (bool, default `false`): enable cost persistence. Also
  enabled automatically if `VOICEGW_DB_PATH` is set.
* `db_path` (string): path to the SQLite database file.
* `daily_budget_alert` (float, optional): global daily budget alert
  threshold.

***

## `latency`

Configure latency monitoring thresholds.

```yaml theme={null}
latency:
  ttfb_warning_ms: 500.0
  percentiles: [50.0, 95.0, 99.0]
```

* `ttfb_warning_ms` (float, default `500.0`): time-to-first-byte
  warning threshold in milliseconds.
* `percentiles` (list of floats): which percentiles to track and
  report.

***

## `rate_limits`

Per-provider rate limiting.

```yaml theme={null}
rate_limits:
  deepgram:
    requests_per_minute: 100
  openai:
    requests_per_minute: 60
```

* `requests_per_minute` (int): maximum requests per minute for the
  given provider.

***

## `ingest`

Rate limiting for the fleet collector ingest endpoint (`POST /v1/ingest`),
where remote agents push telemetry. Limiting is a per-caller token bucket
keyed by virtual key (then static API key, then client IP).

```yaml theme={null}
ingest:
  enabled: true
  requests_per_minute: 120
  burst: 240
  max_batch_size: 1000
```

* `enabled` (bool, default `true`): turn ingest rate limiting on or off.
* `requests_per_minute` (int, default `120`): sustained per-caller request
  rate. Set to `0` to disable limiting (unlimited).
* `burst` (int, default `240`): token-bucket ceiling, the largest burst a
  caller can send before being throttled.
* `max_batch_size` (int, default `1000`): maximum records in one POST. A
  larger batch is rejected with `413` before any database write.

Over-limit requests get `429` with a `Retry-After` header (integer seconds).
The library's remote sink honors `Retry-After` and retries without dropping the
batch, so transient throttling never loses telemetry.

***

## `retention`

Hard-delete aged rows from the collector database. A background worker prunes,
per project, sessions and their dependent rows (replay, turns, dead-air,
guardrail) by `ended_at`, and requests by `timestamp`, in batches.

```yaml theme={null}
retention:
  enabled: true
  default_days: 90
```

* `enabled` (bool, default `true`): turn retention pruning on or off.
* `default_days` (int, default `90`): age after which a project's rows are
  deleted. Applies to every project that has data.

***

## `workers`

Cadence for the collector's background workers: the latency and agent rollups,
and the retention prune. Workers run in-process and are started by the server.
In a multi-replica deployment, set `enabled: false` on every replica except the
one chosen to run them (rollups and prunes are idempotent, but running them on
every replica is wasteful).

```yaml theme={null}
workers:
  enabled: true
  rollup_interval_seconds: 900
  retention_interval_seconds: 3600
```

* `enabled` (bool, default `true`): start the background workers. When `false`,
  no workers run (the rollup tables stay stale and retention does not prune).
* `rollup_interval_seconds` (int, default `900`): how often the latency and
  agent rollups refresh. The Agents dashboard list serves this 24h rollup.
* `retention_interval_seconds` (int, default `3600`): how often retention runs.

***

## `serve`

Bind host and port for the daemon. The daemon serves the HTTP API
(`/v1/*`), the dashboard API (`/api/*`), and the React SPA (`/`)
all on this single port.

```yaml theme={null}
serve:
  host: 0.0.0.0
  port: 8080
```

* `host` (string, default `0.0.0.0`): bind address. Use `127.0.0.1`
  to restrict to localhost.
* `port` (int, default `8080`): port number. The wizard collects
  this as question 4 of [`voicegw onboard`](/cli/onboard).

***

## Environment variable substitution

Any string value in the config can use `${ENV_VAR}` syntax.
VoiceGateway substitutes these at load time using `os.environ`.

```yaml theme={null}
providers:
  deepgram:
    api_key: ${DEEPGRAM_API_KEY}
```

If the environment variable is not set, it resolves to an empty
string.

See [Environment variables](/configuration/environment-variables).
