Cost Tracking
VoiceGateway records the cost of every request that flows through it: tokens for LLM, audio seconds for STT, characters for TTS. Cost data lands in SQLite alongside latency metrics and is the source of truth for the dashboard, thevoicegw reconcile command, and per-project budget enforcement.
This page covers the cost-tracking subsystem end-to-end: the pricing layer, the per-request flow, and the substitute-validation strategy that backs the streaming cost accuracy claim.
Architecture
Pricing layer
The pricing facade insrc/voicegateway/pricing/catalog.py exposes two functions:
calculate_cost dispatches by modality:
- LLM (
modality="llm"): usesinput_tokensandoutput_tokens. Routes topricing/llm.py, which wrapsvoice-prices. Returns the voice-prices total.pricing_source("llm")isvoice-prices@<version>. - STT (
modality="stt"): usesaudio_seconds. Routes topricing/stt.py, which maps the duration onto avoice-priceslookup.pricing_source("stt")isvoice-prices@<version>. - TTS (
modality="tts"): usescharacter_count. Routes topricing/tts.py, samevoice-pricespattern as STT. - Self-hosted (
local/*,ollama/*): priced at$0by a facade guard, attributed asvoicegateway-local.
None for unknown models (never silent zero), so callers can distinguish “free” from “unknown.”
A 60-day staleness gate fails CI when any local-catalog entry’s pricing_source_date is older than 60 days, forcing a quarterly refresh.
Per-request flow
Every wrapped request flows through_InstrumentedBase._log_request:
- Compute total latency as
now - start_time. - Compute TTFB as
first_byte_time - start_timeif the streaming hook fired; otherwise fall back to total latency. - Build a
RequestRecordviaCostTracker.create_record(...), which calls into the pricing facade and attachespricing_sourceto the record. - Write to storage via
SQLiteStorage.log_request(...). A failure logs at warning and is swallowed; in-memory accounting must not break because the disk is full. - Notify the budget enforcer via
CostTracker.notify_spend(...)so per-project caps stay accurate even during a storage outage.
RequestRecord carries the same pricing_source string the catalog returned, so voicegw reconcile can attribute the recorded number to a specific upstream catalog version.
How streaming cost accounting is validated
Streaming is where the real-world cost-tracking bugs hide: tokens that double at chunk boundaries, audio-second accumulators that drift, character counts that miss SSML markup. VoiceGateway closes the validation gap without requiring real production traffic.The substitute strategy
Rather than dogfood the gateway in production and reconcile against provider invoices, VG records real provider streaming responses once viasrc/voicegateway/tests/fixtures/streaming/record_streaming_fixtures.py and replays them in CI forever. Each fixture is a JSON file with three load-bearing sections:
request: the literal payload VG sent.response_stream: the chunks the provider returned, withreceived_at_mstimestamps.provider_reported_usage: the usage block the provider reported at end-of-stream (tokens for LLM, duration for STT, character count for TTS).
expected_cost_usd, computed at recording time by passing provider_reported_usage through voicegateway.pricing.catalog.calculate_cost. Quantized to 8 decimal places. This locks the cost math at the recording’s price: if a catalog updates later, the fixture’s expected_cost_usd stays at the price-at-recording. The fixture validates VG’s math, not “today’s price.”
Filename convention is locked at <provider>_<model>_<modality>_<mode>_<YYYY-MM-DD>.json. The date drives the staleness check.
What the replay tests assert
src/voicegateway/tests/test_streaming_cost_accounting.py parameterizes over every committed fixture and asserts three things per fixture:
- Unit-count consistency:
provider_reported_usageagrees with the actual contents ofresponse_stream. For LLM, the normalizedinput_tokens/output_tokens/total_tokensmust equal the values inside the trailing ChatCompletion usage chunk. For STT,audio_secondsmust equal Deepgram’smetadata.duration. For TTS,character_countmust equallen(request.transcript). Catches recorder field-name typos, provider schema drift, and off-by-one normalization. - Cost calculation:
calculate_cost(provider_reported_usage)quantized to 8 dp must equalfixture.expected_cost_usdquantized to 8 dp. Catches cost-layer regressions (modality-dispatch bugs, pricing-source attribution drift, Decimal precision losses). - TTFB hook behavior (stream fixtures only): a wrapper that calls
_mark_first_bytepartway through must producettfb_ms < total_latency_ms. A wrapper that never calls it must producettfb_ms == total_latency_ms(the documented fallback). Catches modality refactors that forget to wire TTFB.
src/voicegateway/tests/test_ttfb_hook_coverage.py runs the TTFB-hook contract against synthetic streams for every modality, gated against wrap_provider’s dispatch table so a future modality cannot land without TTFB coverage.
Honest limits of the substitute strategy
Fixture replay is not a complete substitute for production traffic. It does not catch:- Real-time streaming behavior: replay is sequential and synchronous. We do not simulate network jitter, partial chunks split across TCP packets, or out-of-order delivery.
- Provider-side correctness: if Deepgram’s reported usage is off by 0.1 seconds, the fixture accepts that as ground truth. The suite validates VG’s accounting matches the provider’s, not whether the provider is right.
- Stale fixtures: recorded fixtures capture provider behavior at a point in time. If a provider changes its streaming format, the fixture’s
response_streamno longer matches what VG would see today. The filename’s date convention surfaces staleness; a quarterly refresh task is on the maintenance backlog. - End-to-end LiveKit session validation: the wrappers are tested in isolation, not as part of a real
AgentSession. Session-level integration testing is deferred (it sits in the OpenRTC-Python Phase 2 plan).
Where to find each piece
src/voicegateway/pricing/catalog.py,llm.py,stt.py,tts.py: the pricing layer.src/voicegateway/middleware/cost_tracker.py: per-request record builder.src/voicegateway/middleware/instrumented_provider.py:_InstrumentedBase+wrap_provider+ the TTFB / log_request hooks.src/voicegateway/tests/fixtures/streaming/: recorded fixtures, schema, loader._schema.py:StreamingFixturePydantic model._loader.py:discover_fixtures,load_fixture, filename-decode helper.README.md: the fixture format and refresh policy.PLACEHOLDER.md: runbook for recording the six minimum fixtures.
src/voicegateway/tests/fixtures/streaming/record_streaming_fixtures.py: the dev-only recorder, gated behind--recordand--confirm. Its module docstring documents cost expectations and operational warnings.src/voicegateway/tests/test_streaming_cost_accounting.py: the three-assertion replay suite.src/voicegateway/tests/test_ttfb_hook_coverage.py: per-modality TTFB hardening.