Providers
VoiceGateway supports 11 providers across cloud and local deployments. Each provider extends theBaseProvider interface and
is instantiated lazily on first use.
Cloud providers
Deepgram
- Modalities: STT, TTS
- Required config:
api_key - Recommended models:
- STT:
deepgram/nova-3(best accuracy),deepgram/nova-2(lower cost) - TTS:
deepgram/aura-asteria-en
- STT:
- Pricing notes: Pay-per-second for STT, pay-per-character for TTS. Nova-3 is priced higher than Nova-2 but offers better accuracy.
OpenAI
- Modalities: STT, LLM, TTS
- Required config:
api_key - Recommended models:
- STT:
openai/whisper-1 - LLM:
openai/gpt-4.1-mini(balanced),openai/gpt-4.1(best quality) - TTS:
openai/tts-1(fast),openai/tts-1-hd(high quality)
- STT:
- Pricing notes: Different pricing tiers per model. GPT-4.1-mini offers a good cost / quality balance for voice agents.
Anthropic
- Modalities: LLM
- Required config:
api_key - Recommended models:
- LLM:
anthropic/claude-sonnet-4-5(balanced),anthropic/claude-opus-4-1(highest quality)
- LLM:
- Pricing notes: Per-token pricing. Check Anthropic’s pricing page for current rates.
Groq
- Modalities: STT, LLM
- Required config:
api_key - Recommended models:
- STT:
groq/whisper-large-v3 - LLM:
groq/llama-3.3-70b-versatile,groq/llama-3.1-8b-instant
- STT:
- Pricing notes: Very fast inference at competitive pricing. The Whisper endpoint is significantly cheaper than OpenAI’s hosted Whisper.
Cartesia
- Modalities: TTS
- Required config:
api_key - Recommended models:
- TTS:
cartesia/sonic-3(latest, best quality)
- TTS:
- Pricing notes: Pay-per-character. Known for low-latency streaming TTS.
ElevenLabs
- Modalities: TTS
- Required config:
api_key - Recommended models:
- TTS:
elevenlabs/eleven_multilingual_v2,elevenlabs/eleven_turbo_v2_5
- TTS:
- Pricing notes: Per-character pricing with monthly quotas depending on plan. Multilingual v2 supports 29 languages.
AssemblyAI
- Modalities: STT
- Required config:
api_key - Recommended models:
- STT:
assemblyai/universal-2(single-tier model)
- STT:
- Pricing notes: Per-second pricing. Offers real-time streaming and batch transcription.
Local providers
Local providers run on your own hardware with no API keys required. They are useful for development, privacy-sensitive deployments, and offline operation.Whisper
- Modalities: STT
- Required config: None (downloads model on first use)
- Recommended models:
- STT:
local/whisper-large-v3(best accuracy),local/whisper-base(fastest)
- STT:
- Notes: Runs OpenAI Whisper locally via faster-whisper. Requires a capable CPU or GPU.
Ollama
- Modalities: LLM
- Required config:
base_url(defaults tohttp://localhost:11434) - Recommended models:
- LLM:
ollama/llama3.2:3b,ollama/mistral:7b,ollama/phi3:mini
- LLM:
- Notes: Requires a running Ollama server. Models are pulled on
first use. Use
docker compose --profile local up -dto start Ollama alongside VoiceGateway.
Kokoro
- Modalities: TTS
- Required config: None
- Recommended models:
- TTS:
local/kokoro
- TTS:
- Notes: Lightweight local TTS. Good for development and testing.
Piper
- Modalities: TTS
- Required config: None
- Recommended models:
- TTS:
local/piper:en_US-lessac-medium,local/piper:en_US-amy-low(voice id after:)
- TTS:
- Notes: Fast offline TTS using ONNX models. Supports multiple languages and voices. Voice models are downloaded on first use.
Provider modality matrix
| Provider | STT | LLM | TTS | Type |
|---|---|---|---|---|
| Deepgram | Yes | — | Yes | Cloud |
| OpenAI | Yes | Yes | Yes | Cloud |
| Anthropic | — | Yes | — | Cloud |
| Groq | Yes | Yes | — | Cloud |
| Cartesia | — | — | Yes | Cloud |
| ElevenLabs | — | — | Yes | Cloud |
| AssemblyAI | Yes | — | — | Cloud |
| Whisper | Yes | — | — | Local |
| Ollama | — | Yes | — | Local |
| Kokoro | — | — | Yes | Local |
| Piper | — | — | Yes | Local |
Per-project provider keys
The top-levelproviders block sets the default keys. Each project
under projects: can override the providers it uses by declaring
its own providers block:
default_project, the set_project
helper from voicegateway.core.active_project, or a virtual key’s
project binding).
DB-managed providers
Beyond YAML, providers can be added at runtime via the MCP server or the dashboard. These rows live in themanaged_providers table
with their API keys Fernet-encrypted by VOICEGW_SECRET. The
runtime resolution order is: YAML providers (top-level + per-project)
first, then DB-managed providers for any missing entries.
Common configuration options
All providers support these shared fields:api_key(string): API key, typically via${ENV_VAR}substitution.base_url(string): override the default API endpoint.enabled(bool, defaulttrue): disable a provider without removing its config.