overflow-inference
the asked-of humd is full; the work flows to the one with capacity
See sim/tests/overflow_inference.rs for the executable form.
The setup
Trust tier T3/T4 — federated or open mesh. Two humds with asymmetric capacity:
- humd-A — gateway. Has a nestler attached (an
openai-serverbee fronting a public HTTPS endpoint, convention-stateful, medium richness). It announcesnests: []for the requested model orcan_relay: truewith no local capacity. It is the door, not the kitchen. - humd-B — worker. Hosts a
claude-clinest and advertisesnests: ["claude-cli"],hosts: [...]in itsPeerCapabilities, with available inference slots.
Both humds belong to the same ensemble. Discovery has already populated each side’s peer registry with the other’s caps. No nestler is attached to humd-B; its role is purely to host hums on behalf of routed prompts.
The happy path
- A client hits humd-A’s
openai-servernestler with a prompt for a model only humd-B can serve. The nestler emitschi:"prompt"into humd-A’s daemon. - humd-A consults the ensemble: it has no local nest for this
model and no spare slot regardless; humd-B advertises both.
humd-A picks humd-B (capacity-aware scoring: free slots, RTT,
advertised model coverage) and emits an
overflow.routedecision trace. - humd-A mints a fresh sigil for the routed hum and forwards the
prompt as
chi:"prompt"withto: <humd-B HumdId>and a bookkeeping fieldorigin: <humd-A HumdId>so humd-B knows where to stream petals back. - humd-B accepts, spawns the brood on its local
claude-clinest, begins blooming. Every outbound petal is routed to humd-A as well as kept locally; humd-A forwards them onto the originating nestler’s stream so the HTTPS client sees real-time chunks. chi:"finish"lands on humd-B; it forwards to humd-A; humd-A closes the SSE/HTTP response with matchingusage.- After close, humd-A still holds the full transcript replicated from humd-B — the gateway has a local copy for audit/retry, not just for live forwarding.
The failure modes
- Capacity lie. humd-B advertised capacity but is in fact full.
humd-A’s route attempt must surface
chi:"error"withqualifier:"overflow.no-capacity"and either retry against another peer or fail the client cleanly — never hang. - Drop mid-stream. humd-B’s link to humd-A drops between
chi:"chunk"5 and 6. The test must catch this as a routed-bloom error surfaced to humd-A’s nestler, not a silent truncation. Bonus: on heal, wane allows humd-A to fetch the missing slice without re-prompting. - Replication gap. humd-A receives the live chunks but the post-close audit copy is incomplete (missing tool-calls, missing drone, missing perf-marks). Test asserts the replicated transcript on humd-A is byte-equivalent to humd-B’s local hum log for the sigil.
- Misroute. humd-A picks a peer that has the nest kind but not
the specific model. The brood emits
chi:"error"qualifiedoverflow.model-unavailable; humd-A must propagate to the client with the same qualifier, not a generic 500. - No eligible peer. No humd in the ensemble advertises the model.
humd-A must fail synchronously with
overflow.no-routebefore any network attempt, not after a timeout.
The success criteria
- humd-A emits exactly one trace
nest.overflow.routednaming the chosenHumdIdbefore the first chunk is requested upstream. - humd-B’s local tap receives
chi:"prompt"withorigin = humd-A.idwithinRTT + 50msof step 1. - The HTTPS client connected to humd-A receives the first SSE chunk
within
RTT_AB + RTT_AB + first-token-latency + 100ms(one RTT to humd-B, one back for the first petal). - The client’s terminal SSE event carries
usage.output_tokens > 0and matches theusageon humd-B’s localchi:"finish"exactly. - After close, humd-A’s hum store for the routed sigil contains the
full ordered petal log; comparison against humd-B’s store yields
zero diff.
WaneTracker::is_behindon humd-A reads false against humd-B’s tip. - For each failure mode above, the corresponding error qualifier
reaches the client within
RTT + 200msof detection.
What this scenario validates
- Capacity-aware routing.
PeerCapabilities.hosts/caps.nestsplus advertised slot count drive a real selection decision, not just a lookup. - Transcript replication back to the requester. The originating humd ends up with the full hum on disk, not just the live stream. This is what makes the gateway useful for retry, audit, and later attach-from-elsewhere.
- Cross-humd
origintracking. Petals flow back along an explicit return path, not through ambient broadcast. - Graceful failure on every degenerate input. Lying peer, vanishing peer, model-shaped mismatch, empty mesh — each has a distinct qualifier surfaced to the client.
- Routing surface under load. Same
Ensemble::routeprimitive as the other scenarios, here with the added pressure of choosing which peer when more than one could serve.