Fleet worker heartbeat contract
VoiceGateway has one producer of worker presence and two stores that consume it. This page is the canonical contract between them. Any change to how either store ingests a heartbeat must update this page and match it, or the two rosters silently diverge (a worker judged โonlineโ in one and โofflineโ in the other, or attributed to different tenants).The one producer, the two stores
- Producer:
voicegateway.register_worker(...)in every agent process. It posts a periodic presence payload (below) to${VOICEGW_COLLECTOR_URL}/v1/agents/heartbeatwith the tenantโsvk_ingest key as the bearer token. - Store A (hosted):
voicegateway-cloudโscloud_workerstable +POST /v1/agents/heartbeat/GET /v1/agents, surfaced ondash.voicegateway.dev. - Store B (self-hosted): the engineโs own
voicegw serveworkerstable + the same/v1/agents/heartbeat//v1/agentsroutes, surfaced in the OpenOrca console via/openorca/snapshotand/openorca/events.
/v1/agents/heartbeat contract, the
same register_worker heartbeat feeds either one: point VOICEGW_COLLECTOR_URL
at the cloud for the SaaS dashboard, or at a self-hosted voicegw serve for the
OpenOrca console. That symmetry only holds if both ingest identically.
The heartbeat payload (canonical)
register_workerโs presence() sends exactly this JSON:
Ingestion rules (both stores MUST follow)
- Tenant is derived server-side from the
vk_key, never from the body. Thetenant_idin the payload is advisory only. A worker can only ever be written under the keyโs tenant, so it can never appear under another tenant. - Identity is
(tenant, agent_id).agent_idis the node identity for upsert, roster keys, and any UI node id. Do not key identity onagent_name(it groups workers, it does not identify one). last_seenis stamped SERVER-SIDE at ingest (now()/time.time()on the receiving server). The payloadtsis informational metadata only and MUST NOT drive liveness: a client clock that is skewed or forged would otherwise read perpetually online or offline.- Upsert atomically on
(tenant, agent_id)via a nativeINSERT ... ON CONFLICT DO UPDATE. A get-then-insert races two concurrent first beats into duplicate rows (and a subsequent read that expects one row). - Offline TTL is 45 seconds (three missed ~15s beats). A worker whose
server-stamped
last_seenis older than the TTL reportsstatus: "offline"andactive_sessions: 0, regardless of the last status it sent. - Status vocabulary is
idle | busy | offline. Constrain to this set on ingest; do not store or serve arbitrary client-supplied status strings.
Compatibility matrix
Field / behavior, as of the two current implementations.cloud_workers is the
reference; the engine workers table (introduced with the OpenOrca console)
must align to it.
| Aspect | cloud_workers (cloud) | engine workers | Status |
|---|---|---|---|
| Primary key | (tenant_id, agent_id) | surrogate id + UniqueConstraint(tenant_id, agent_id) | equivalent uniqueness (OK) |
tenant_id | NOT NULL (from key) | nullable (for the no-credential operator) | โ ๏ธ engine NULL path enables the duplicate-row race in rule 4 |
| Tenant source | key only, body ignored | key, but falls back to body tenant_id when the key tenant is NULL | โ ๏ธ align to rule 1 |
last_seen | server-stamped DateTime(tz) | client ts stored as float | โ ๏ธ primary drift โ align to rule 3 (server-stamp) |
| Upsert | native ON CONFLICT DO UPDATE | get-then-insert with IntegrityError retry | โ ๏ธ align to rule 4 (matches only for non-null tenants) |
| Offline TTL | 45s | 45s | OK |
| Status vocab | idle / busy / offline | idle / busy / offline (but client status passed through unvalidated) | mostly OK, add rule 6 validation |
| Node identity | agent_id | roster keys agent_id, but the OpenOrca mapper keys nodes on agent_name | โ ๏ธ align to rule 2 |
Keeping them from drifting
- This page is the single source of truth. A PR that changes ingestion in either store must update this page in the same change and satisfy every rule above.
- Prefer sharing the semantics rather than re-deriving them: the offline TTL, the status vocabulary, and the payload field names should have one definition the engine owns (it is the producer), which the cloud consumes.
- The engine-side alignment items (rules 1-4, 6) are tracked against the OpenOrca console backend PR; the cloud side already satisfies the contract.