Local-Only Deployment
Run VoiceGateway entirely on local hardware with zero cloud dependencies. Uses Ollama for LLM, Whisper for STT, and Kokoro for TTS. Ideal for air-gapped environments, development without API keys, or privacy-sensitive deployments.Prerequisites
Install Ollama
Install VoiceGateway with Local Providers
torch and will download model weights on first use. Kokoro requires the kokoro package.
Configuration
Createvoicegw.yaml:
Basic Usage
LiveKit Agent with Local Models
Docker Compose with Ollama
For a containerized local-only setup:voicegw.yaml to point Ollama at the container:
Using Piper TTS as an Alternative
If Kokoro is not available, Piper is another local TTS option:Performance Considerations
Local models have different performance characteristics than cloud APIs:| Metric | Cloud (Deepgram + GPT-4.1) | Local (Whisper + Qwen2.5) |
|---|---|---|
| STT TTFB | ~100-200ms | ~500-2000ms (depends on GPU) |
| LLM TTFB | ~200-500ms | ~300-3000ms (depends on model size) |
| TTS TTFB | ~100-300ms | ~200-1000ms |
| Cost | ~$0.01-0.05/request | $0.00 |
- GPU acceleration: ensure CUDA/Metal is available for Whisper and Ollama
- Smaller models: use
local/whisper-baseinstead oflocal/whisper-large-v3for faster STT - Quantized LLMs: Ollama automatically uses quantized models (Q4_0, Q4_K_M)
- Keep models warm: Ollama keeps the most recent model in memory; avoid switching frequently