partition-and-heal
the wire breaks; both sides keep humming; the wire mends; the wanes reconcile
See sim/tests/partition_and_heal.rs for the executable form.
The setup
Trust tier — irrelevant; this scenario is about reconciliation mechanics, not trust. Use T2 for concreteness. Two humds, both hosting nestlers, both attached to the same hum:
- humd-A — has a nestler driving. Owns one half of the ongoing hum’s local wane.
- humd-B — has a second nestler also attached, kept in lockstep
via the tee from
co-pilot. Owns the other half.
Both share the sigil. WaneTracker on each side advances as petals
land locally. The in-memory InMemoryEndpoint between them is wrapped
in a sim middleware that can be commanded to partition (drop all
tones in both directions) and heal (resume delivery, optionally
with a backlog flush from a buffered queue).
A petal source — a real or simulated brood — is colocated with humd-A and continues producing petals during the partition.
The happy path
- Hum opens. Both nestlers see
chi:"hello", then a clean prefix ofchi:"chunk"petals. Wanes on both sides advance in lockstep up to tickN. - Sim partitions the link between humd-A and humd-B. From this
moment, no tones cross. Both daemons trace
ensemble.peer.unreachablefor the other id; drone stays quiet — no panic, no spammy retry storm, no surfaced error to the nestlers beyond a singlechi:"breath"qualifiedpeer.partition. - humd-A’s brood continues producing petals. humd-A’s local nestler
sees
N+1, N+2, …, N+k. humd-A’s wane advances toN+k. - humd-B, being a tee-only target for this hum, receives nothing
during partition. Its wane stays pinned at
N. Its nestler sees no fake petals, no replay loop, no synthetic activity. It simply waits. - Sim heals the link. Both daemons trace
ensemble.peer.reachable. - Each side issues
wane.compareagainst the other (the reconciliation handshake). humd-A discovers humd-B is behind bykpetals for this sigil; humd-B discovers it is behind. humd-A replays petalsN+1 … N+kin order, exactly once. - humd-B’s nestler observes the
kmissing petals appear in order after a silent gap. Its wane catches up toN+k. FromN+k+1onward, both humds resume lockstep tee.
The failure modes
- Duplicate petals. Reconciliation replays a petal humd-B already
has, perhaps because the partition happened to drop only one
direction. Test asserts humd-B’s transcript shows each
ridexactly once across the full run. - Lost petals. Reconciliation misses a petal — humd-B’s
transcript is missing some
N+i. Test asserts strict monotonicity: every tick from 1 to finalMis present on both sides. - Divergent wane after heal. Either
WaneTracker::is_behindremains true after reconciliation settles, or the two tips disagree. Test asserts: 30 seconds (sim time) after heal, both tips equal andis_behind == falseon both sides. - Drone noise during partition. humd-A or humd-B’s drone fires
during the partition window with false-positive errors or
speculative re-prompts. Test sets a drone-quiet assertion across
the partition window — no
chi:"drone"with severity >= warning while the peer is known-unreachable for the same hum. - Split-brain prompts. humd-B’s nestler manages to emit a
chi:"prompt"during partition (it shouldn’t, the hum is hosted on humd-A). The local humd-B daemon must reject withchi:"error"qualifiedpeer.partition.write-denied. After heal, no buffered prompts must replay. - Heal storm. Reconciliation floods the wire with every petal ever produced instead of the diff slice. Test asserts the number of tones flowing during reconciliation is bounded by the wane delta, not by total bloom size.
The success criteria
- During the partition window, humd-B’s tap receives zero
chi:"chunk"tones. humd-A’s tap receives the live stream uninterrupted. - During the partition window, both humds trace
ensemble.peer.unreachableexactly once per direction; neither emits achi:"drone"of severity warning-or-higher attributed to the unreachable peer. - After heal, humd-B’s tap receives exactly
k = wane_A - wane_Bpetals before any new petals, in strict sigil-order, withinRTT + 100msof heal detection. - After reconciliation, both
WaneTrackers report the same tip for the sigil and bothis_behindqueries return false. - For every
ridproduced by the brood during the entire run, both humds’ final transcripts contain it exactly once. Set-equality with multiplicity asserted programmatically. - The terminal
chi:"finish"arrives on both taps withusage.output_tokens > 0and identical counts.
What this scenario validates
- Wane convergence. The lazy, Matrix-shaped reconciliation
protocol in
thrum_core::WaneTrackersurvives an arbitrary-length partition with no replay duplicates and no missing petals. - No duplicate / no lost petals. The catch-up slice is precisely the diff, computed from wane comparison, not from time windows or full re-sync.
- Drone-quiet during partition. A known peer being unreachable is a single observable event, not a recurring alarm. The drone policy distinguishes “we know it’s gone” from “something is wrong.”
- Partition tolerance surface.
Ensemblekeeps the peer in its registry across the outage, marks it unreachable, and resumes routing on heal — no peer flap, no churn inpeers(). - Locality of authority. The hum is hosted on humd-A; humd-B cannot accept writes for it during the partition, and the system refuses to fabricate a split-brain history.