Budget Enforcement
VoiceGateway supports per-project daily budgets with three enforcement modes:warn, throttle, and block. Budgets are enforced at request-completion time inside the cost tracker; the inference factories themselves never raise on budget. The BudgetThrottleSignal and BudgetExceededError types are available for callers that want to wire their own pre-request check (CLI / HTTP / dashboard).
Configuration
Mode 1: Warn
Thewarn mode logs a warning when the budget is exceeded but allows all requests to proceed. Use this for visibility without disrupting service.
Mode 2: Throttle (caller-driven)
The inference factories do not raiseBudgetThrottleSignal themselves. Wire a pre-flight check in your worker if you want the throttle path:
_budget_enforcer reference is an internal handle today; a public inference.check_budget() helper is planned so callers no longer reach into a private attribute.
Mode 3: Block (caller-driven)
Budget Status API
Check budget status before making a request:BudgetEnforcer.get_budget_status() method returns:
| Status | Condition |
|---|---|
"ok" | Under 80% of budget |
"warning" | Between 80% and 100% of budget |
"exceeded" | At or over 100% of budget |
Cache Behavior
Budget checks are cached in memory with a 30-second TTL to avoid hitting SQLite on every single request. This means:- A budget may be briefly exceeded before the cache refreshes
- The maximum over-spend window is 30 seconds of requests
- The TTL is configurable via
BudgetEnforcer(cache_ttl_seconds=30.0)
Combining with Fallback Chains
The throttle path can be paired with the manual chain walk pattern from Fallback Chains: onBudgetThrottleSignal, walk a chain that ends in a local model.