The reversible kernel the theorem actually requires does not equilibrate at accessible scale, so P0 halts before any prediction is exercised.
The question
Can we measure a finite integrated autocorrelation time for the -satisfying negative-phase sampler — the precondition for the premise ? This is the complete technical record. The setup follows on from experiments/exp3 (a deterministic alternating-scan kernel) by swapping in the reversible object the proof's spectral machinery needs.
The setup
Substrate: Lightning AI H200 (80→141 GB), dtm-replication @ 7c22d19, thrml 0.1.3 plus the EXP4-REVERSIBLE-SCAN patch (v2 toggle), single-input 60_12 DTM-MNIST conditional , seed 0. The negative-phase kernel is the symmetrized four-block Gibbs sweep — self-adjointness re-passed at (the deterministic exp3 kernel confirmed non-reversible at ).
The probe is the non-circular doubling-stability measurement: a per-chain half-Sokal estimator on a self-consistent window (), with doubled and warm-up scaled to . The stability rule resolves only when .
A v2 order-coin toggle made the run affordable: the per-chain coin forced XLA to compute both sweeps under the 400-chain vmap (38.6 s/epoch). Threading a shared order_key in training (true lax.cond, one sweep) gives 11.6 s/epoch while diagnostics keep the per-chain coin — so the across-chain SEM used in P5 stays exactly independent. The per-chain marginal kernel is identical in both modes.
The result
τ_max grows dead-linearly in the trajectory length , with essentially constant () across six doublings:
| (sweeps) | warm | | | self-consistent () | |---|---|---|---|---| | 1,000 | 200 | 166.7 | 0.167 | yes | | 2,000 | 833 | 333.2 | 0.167 | yes | | 4,000 | 1,666 | 662.6 | 0.166 | yes | | 8,000 | 3,313 | 1,325 | 0.166 | yes | | 16,000 | 6,627 | 2,431 | 0.152 | yes | | 32,000 | 12,153 | 5,280 | 0.165 | yes |
A constant means the integrated accumulates as fast as data is added — the autocorrelation function has not decayed within the window. This is a near-zero spectral gap / effectively non-equilibrating chain. The doubling-stability criterion is never met (each doubling roughly doubles ), so the rule correctly refuses to resolve . Warm-train ran 200/200 epochs at 11.623 s/epoch, confirming both the at-scale cost ( h for a 2000-epoch run) and the v2 fix.
Consequence: the averaging windows are uninstantiable and the premise is unreachable for this kernel at this checkpoint. Registered outcome (F): P0-HALT.
Scope and caveats
This does show, robustly, that the -required reversible kernel does not equilibrate at accessible scale (measured curve to ; further doublings could only add more growing- points). It does not by itself distinguish two readings, both giving the same operational verdict: (1) a genuine near-zero gap — the trained conditional is multimodal and the chain cannot cross basins (, the very plateau regime the theorem is about); or (2) inadequate burn-in — true warm-up, but if true is unbounded no feasible burn-in helps.
The exp3 comparison is confounded: exp4 changed both the kernel (→ reversible) and the window length, so the jump from – to cannot be cleanly attributed. The qualitative tell — exp3's was stable across while exp4's grows with no leveling — favors genuinely slower mixing over mere truncation. The deeper tension worth flagging: exp3's faster-mixing kernel violated ; the -valid kernel is the slow one.
Honesty: the Studio was cut off (credit exhaustion) after , before p0_calibrate.json and the A7-spectrum feasibility probe were written — so P0 is sufficient to fire the -UNRESOLVED HALT but the A7 measurement is unrun. P1–P5 did not run (no recorded DECISION: PROCEED); nothing is reported as measured for them.
No tag flip. The conditional factorization (– + plateau + F4 ) stays [solid] (untouched — this is an operational test); the operational/unconditional claim stays [conjectured], now for the deeper reason that the chain does not equilibrate at accessible scale. This sharpens Risk 5 (the gate) and Risk 1 (at-scale tracking), leaving both [open].
What this feeds: the natural next investigation is distinguishing near-zero-gap from inadequate-burn-in (ACF shape, or an earlier/less-trained checkpoint with finite ) and reaching the credit-gated A7-spectrum feasibility probe — both deferred.