Providers

VoiceGateway supports 11 providers across cloud and local deployments. Each provider extends the BaseProvider interface and is instantiated lazily on first use.

Cloud providers

Deepgram

Modalities: STT, TTS
Required config: api_key
Recommended models:
- STT: deepgram/nova-3 (best accuracy), deepgram/nova-2 (lower cost)
- TTS: deepgram/aura-asteria-en
Pricing notes: Pay-per-second for STT, pay-per-character for TTS. Nova-3 is priced higher than Nova-2 but offers better accuracy.

providers:
  deepgram:
    api_key: ${DEEPGRAM_API_KEY}

OpenAI

Modalities: STT, LLM, TTS
Required config: api_key
Recommended models:
- STT: openai/whisper-1
- LLM: openai/gpt-4.1-mini (balanced), openai/gpt-4.1 (best quality)
- TTS: openai/tts-1 (fast), openai/tts-1-hd (high quality)
Pricing notes: Different pricing tiers per model. GPT-4.1-mini offers a good cost / quality balance for voice agents.

providers:
  openai:
    api_key: ${OPENAI_API_KEY}

Anthropic

Modalities: LLM
Required config: api_key
Recommended models:
- LLM: anthropic/claude-sonnet-4-5 (balanced), anthropic/claude-opus-4-1 (highest quality)
Pricing notes: Per-token pricing. Check Anthropic’s pricing page for current rates.

providers:
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}

Groq

Modalities: STT, LLM
Required config: api_key
Recommended models:
- STT: groq/whisper-large-v3
- LLM: groq/llama-3.3-70b-versatile, groq/llama-3.1-8b-instant
Pricing notes: Very fast inference at competitive pricing. The Whisper endpoint is significantly cheaper than OpenAI’s hosted Whisper.

providers:
  groq:
    api_key: ${GROQ_API_KEY}

Cartesia

Modalities: TTS
Required config: api_key
Recommended models:
- TTS: cartesia/sonic-3 (latest, best quality)
Pricing notes: Pay-per-character. Known for low-latency streaming TTS.

providers:
  cartesia:
    api_key: ${CARTESIA_API_KEY}

ElevenLabs

Modalities: TTS
Required config: api_key
Recommended models:
- TTS: elevenlabs/eleven_multilingual_v2, elevenlabs/eleven_turbo_v2_5
Pricing notes: Per-character pricing with monthly quotas depending on plan. Multilingual v2 supports 29 languages.

providers:
  elevenlabs:
    api_key: ${ELEVENLABS_API_KEY}

AssemblyAI

Modalities: STT
Required config: api_key
Recommended models:
- STT: assemblyai/universal-2 (single-tier model)
Pricing notes: Per-second pricing. Offers real-time streaming and batch transcription.

providers:
  assemblyai:
    api_key: ${ASSEMBLYAI_API_KEY}

Local providers

Local providers run on your own hardware with no API keys required. They are useful for development, privacy-sensitive deployments, and offline operation.

Whisper

Modalities: STT
Required config: None (downloads model on first use)
Recommended models:
- STT: local/whisper-large-v3 (best accuracy), local/whisper-base (fastest)
Notes: Runs OpenAI Whisper locally via faster-whisper. Requires a capable CPU or GPU.

providers:
  whisper:
    enabled: true

Ollama

Modalities: LLM
Required config: base_url (defaults to http://localhost:11434)
Recommended models:
- LLM: ollama/llama3.2:3b, ollama/mistral:7b, ollama/phi3:mini
Notes: Requires a running Ollama server. Models are pulled on first use. Use docker compose --profile local up -d to start Ollama alongside VoiceGateway.

providers:
  ollama:
    base_url: http://localhost:11434

Kokoro

Modalities: TTS
Required config: None
Recommended models:
- TTS: local/kokoro
Notes: Lightweight local TTS. Good for development and testing.

providers:
  kokoro:
    enabled: true

Piper

Modalities: TTS
Required config: None
Recommended models:
- TTS: local/piper:en_US-lessac-medium, local/piper:en_US-amy-low (voice id after :)
Notes: Fast offline TTS using ONNX models. Supports multiple languages and voices. Voice models are downloaded on first use.

providers:
  piper:
    enabled: true

Provider modality matrix

Provider	STT	LLM	TTS	Type
Deepgram	Yes	—	Yes	Cloud
OpenAI	Yes	Yes	Yes	Cloud
Anthropic	—	Yes	—	Cloud
Groq	Yes	Yes	—	Cloud
Cartesia	—	—	Yes	Cloud
ElevenLabs	—	—	Yes	Cloud
AssemblyAI	Yes	—	—	Cloud
Whisper	Yes	—	—	Local
Ollama	—	Yes	—	Local
Kokoro	—	—	Yes	Local
Piper	—	—	Yes	Local

Per-project provider keys

The top-level providers block sets the default keys. Each project under projects: can override the providers it uses by declaring its own providers block:

providers:
  openai:
    api_key: ${DEFAULT_OPENAI_KEY}

projects:
  tonys-pizza:
    name: Tony's Pizza
    providers:
      openai:
        api_key: ${TONYS_OPENAI_KEY}  # overrides for this project

The inference factories pick the right key automatically based on the active project (set via default_project, the set_project helper from voicegateway.core.active_project, or a virtual key’s project binding).

DB-managed providers

Beyond YAML, providers can be added at runtime via the MCP server or the dashboard. These rows live in the managed_providers table with their API keys Fernet-encrypted by VOICEGW_SECRET. The runtime resolution order is: YAML providers (top-level + per-project) first, then DB-managed providers for any missing entries.

Common configuration options

All providers support these shared fields:

api_key (string): API key, typically via ${ENV_VAR} substitution.
base_url (string): override the default API endpoint.
enabled (bool, default true): disable a provider without removing its config.

See voicegw.yaml reference, Models.

​Providers

​Cloud providers

​Deepgram

​OpenAI

​Anthropic

​Groq

​Cartesia

​ElevenLabs

​AssemblyAI

​Local providers

​Whisper

​Ollama

​Kokoro

​Piper

​Provider modality matrix

​Per-project provider keys

​DB-managed providers

​Common configuration options