> ## Documentation Index
> Fetch the complete documentation index at: https://docs.voicegateway.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Cost tracking

# Cost Tracking

VoiceGateway records the cost of every request that flows through it: tokens for LLM, audio seconds for STT, characters for TTS. Cost data lands in SQLite alongside latency metrics and is the source of truth for the dashboard, the `voicegw reconcile` command, and per-project budget enforcement.

This page covers the cost-tracking subsystem end-to-end: the pricing layer, the per-request flow, and the substitute-validation strategy that backs the streaming cost accuracy claim.

## Architecture

```mermaid theme={null}
graph LR
    subgraph Request["Per-Request Path"]
        WRAP["InstrumentedSTT/LLM/TTS<br/>(transparent proxy)"]
        CT["CostTracker.create_record()"]
        STORE["SQLiteStorage.log_request()"]
        BUDGET["BudgetEnforcer.notify_spend()"]
    end

    subgraph Pricing["Pricing layer (modality dispatch)"]
        FACADE["voicegateway.pricing.catalog<br/>calculate_cost()"]
        LLM["llm.py<br/>(voice-prices wrapper)"]
        STT["stt.py<br/>(voice-prices wrapper)"]
        TTS["tts.py<br/>(voice-prices wrapper)"]
    end

    WRAP --> CT
    CT --> FACADE
    FACADE --> LLM
    FACADE --> STT
    FACADE --> TTS
    CT --> STORE
    CT --> BUDGET
```

## Pricing layer

The pricing facade in `src/voicegateway/pricing/catalog.py` exposes two functions:

```python theme={null}
calculate_cost(
    modality: str,
    model: str,
    *,
    audio_seconds: float = 0.0,
    input_tokens: int = 0,
    output_tokens: int = 0,
    character_count: int = 0,
) -> Decimal | None

pricing_source(modality: str) -> str
```

`calculate_cost` dispatches by modality:

* **LLM** (`modality="llm"`): uses `input_tokens` and `output_tokens`. Routes to `pricing/llm.py`, which wraps `voice-prices`. Returns the voice-prices total. `pricing_source("llm")` is `voice-prices@<version>`.
* **STT** (`modality="stt"`): uses `audio_seconds`. Routes to `pricing/stt.py`, which maps the duration onto a `voice-prices` lookup. `pricing_source("stt")` is `voice-prices@<version>`.
* **TTS** (`modality="tts"`): uses `character_count`. Routes to `pricing/tts.py`, same `voice-prices` pattern as STT.
* **Self-hosted** (`local/*`, `ollama/*`): priced at `$0` by a facade guard, attributed as `voicegateway-local`.

All three modalities return `None` for unknown models (never silent zero), so callers can distinguish "free" from "unknown."

A 60-day staleness gate fails CI when any local-catalog entry's `pricing_source_date` is older than 60 days, forcing a quarterly refresh.

## Per-request flow

Every wrapped request flows through `_InstrumentedBase._log_request`:

1. **Compute total latency** as `now - start_time`.
2. **Compute TTFB** as `first_byte_time - start_time` if the streaming hook fired; otherwise fall back to total latency.
3. **Build a `RequestRecord`** via `CostTracker.create_record(...)`, which calls into the pricing facade and attaches `pricing_source` to the record.
4. **Write to storage** via `SQLiteStorage.log_request(...)`. A failure logs at warning and is swallowed; in-memory accounting must not break because the disk is full.
5. **Notify the budget enforcer** via `CostTracker.notify_spend(...)` so per-project caps stay accurate even during a storage outage.

Each `RequestRecord` carries the same `pricing_source` string the catalog returned, so `voicegw reconcile` can attribute the recorded number to a specific upstream catalog version.

## How streaming cost accounting is validated

Streaming is where the real-world cost-tracking bugs hide: tokens that double at chunk boundaries, audio-second accumulators that drift, character counts that miss SSML markup. VoiceGateway closes the validation gap without requiring real production traffic.

### The substitute strategy

Rather than dogfood the gateway in production and reconcile against provider invoices, VG records real provider streaming responses **once** via `src/voicegateway/tests/fixtures/streaming/record_streaming_fixtures.py` and replays them in CI forever. Each fixture is a JSON file with three load-bearing sections:

* `request`: the literal payload VG sent.
* `response_stream`: the chunks the provider returned, with `received_at_ms` timestamps.
* `provider_reported_usage`: the usage block the provider reported at end-of-stream (tokens for LLM, duration for STT, character count for TTS).

The fixture also pins `expected_cost_usd`, computed at recording time by passing `provider_reported_usage` through `voicegateway.pricing.catalog.calculate_cost`. Quantized to 8 decimal places. **This locks the cost math at the recording's price**: if a catalog updates later, the fixture's `expected_cost_usd` stays at the price-at-recording. The fixture validates VG's *math*, not "today's price."

Filename convention is locked at `<provider>_<model>_<modality>_<mode>_<YYYY-MM-DD>.json`. The date drives the staleness check.

### What the replay tests assert

`src/voicegateway/tests/test_streaming_cost_accounting.py` parameterizes over every committed fixture and asserts three things per fixture:

1. **Unit-count consistency**: `provider_reported_usage` agrees with the actual contents of `response_stream`. For LLM, the normalized `input_tokens` / `output_tokens` / `total_tokens` must equal the values inside the trailing ChatCompletion usage chunk. For STT, `audio_seconds` must equal Deepgram's `metadata.duration`. For TTS, `character_count` must equal `len(request.transcript)`. Catches recorder field-name typos, provider schema drift, and off-by-one normalization.
2. **Cost calculation**: `calculate_cost(provider_reported_usage)` quantized to 8 dp must equal `fixture.expected_cost_usd` quantized to 8 dp. Catches cost-layer regressions (modality-dispatch bugs, pricing-source attribution drift, Decimal precision losses).
3. **TTFB hook behavior** (stream fixtures only): a wrapper that calls `_mark_first_byte` partway through must produce `ttfb_ms < total_latency_ms`. A wrapper that never calls it must produce `ttfb_ms == total_latency_ms` (the documented fallback). Catches modality refactors that forget to wire TTFB.

Plus a separate `src/voicegateway/tests/test_ttfb_hook_coverage.py` runs the TTFB-hook contract against synthetic streams for every modality, gated against `wrap_provider`'s dispatch table so a future modality cannot land without TTFB coverage.

### Honest limits of the substitute strategy

Fixture replay is not a complete substitute for production traffic. It does **not** catch:

* **Real-time streaming behavior**: replay is sequential and synchronous. We do not simulate network jitter, partial chunks split across TCP packets, or out-of-order delivery.
* **Provider-side correctness**: if Deepgram's reported usage is off by 0.1 seconds, the fixture accepts that as ground truth. The suite validates VG's accounting matches the provider's, not whether the provider is right.
* **Stale fixtures**: recorded fixtures capture provider behavior at a point in time. If a provider changes its streaming format, the fixture's `response_stream` no longer matches what VG would see today. The filename's date convention surfaces staleness; a quarterly refresh task is on the maintenance backlog.
* **End-to-end LiveKit session validation**: the wrappers are tested in isolation, not as part of a real `AgentSession`. Session-level integration testing is deferred (it sits in the OpenRTC-Python Phase 2 plan).

The architecture is honest about this scope: cost tracking is validated against fixture-recorded provider responses, not against real production traffic. Without the fixture-replay phase, that distinction would be invisible; with it, the per-fixture date and provider attribution make the validation surface explicit.

### Where to find each piece

* `src/voicegateway/pricing/catalog.py`, `llm.py`, `stt.py`, `tts.py`: the pricing layer.
* `src/voicegateway/middleware/cost_tracker.py`: per-request record builder.
* `src/voicegateway/middleware/instrumented_provider.py`: `_InstrumentedBase` + `wrap_provider` + the TTFB / log\_request hooks.
* `src/voicegateway/tests/fixtures/streaming/`: recorded fixtures, schema, loader.
  * `_schema.py`: `StreamingFixture` Pydantic model.
  * `_loader.py`: `discover_fixtures`, `load_fixture`, filename-decode helper.
  * `README.md`: the fixture format and refresh policy.
  * `PLACEHOLDER.md`: runbook for recording the six minimum fixtures.
* `src/voicegateway/tests/fixtures/streaming/record_streaming_fixtures.py`: the dev-only recorder, gated behind `--record` and `--confirm`. Its module docstring documents cost expectations and operational warnings.
* `src/voicegateway/tests/test_streaming_cost_accounting.py`: the three-assertion replay suite.
* `src/voicegateway/tests/test_ttfb_hook_coverage.py`: per-modality TTFB hardening.
