Thermodynamic Machine Learning · MMXXVI
Experiment3.VI.MMXXVIRead 4 min

Exp 4 — Reversible Kernel: τ̂ Unresolved (P0-HALT)

Entry 7

The reversible kernel the theorem actually requires does not equilibrate at accessible scale, so P0 halts before any prediction is exercised.

The question

Can we measure a finite integrated autocorrelation time τint\tau_{int} for the A2A2-satisfying negative-phase sampler — the precondition for the A6A6 premise KτintK \gg \tau_{int}? This is the complete technical record. The setup follows on from experiments/exp3 (a deterministic alternating-scan kernel) by swapping in the reversible object the proof's spectral machinery needs.

The setup

Substrate: Lightning AI H200 (80→141 GB), dtm-replication @ 7c22d19, thrml 0.1.3 plus the EXP4-REVERSIBLE-SCAN patch (v2 toggle), single-input 60_12 DTM-MNIST conditional πθ(x0)\pi_\theta(\cdot \mid x_0), seed 0. The negative-phase kernel is the symmetrized four-block Gibbs sweep M=12(Pfwd+Prev)M = \tfrac{1}{2}(P_{fwd} + P_{rev})self-adjointness re-passed at 1017\sim 10^{-17} (the deterministic exp3 kernel confirmed non-reversible at 102\sim 10^{-2}).

The probe is the non-circular doubling-stability measurement: a per-chain half-Sokal τint\tau_{int} estimator on a self-consistent window (L5τL \ge 5\tau), with LL doubled and warm-up scaled to τ^\hat{\tau}. The stability rule resolves τ^\hat{\tau} only when τ(2L)τ(L)/τ(L)<0.15|\tau(2L) - \tau(L)| / \tau(L) < 0.15.

A v2 order-coin toggle made the run affordable: the per-chain coin forced XLA to compute both sweeps under the 400-chain vmap (38.6 s/epoch). Threading a shared order_key in training (true lax.cond, one sweep) gives 11.6 s/epoch while diagnostics keep the per-chain coin — so the across-chain SEM used in P5 stays exactly independent. The per-chain marginal kernel is identical 12(Pfwd+Prev)\tfrac{1}{2}(P_{fwd}+P_{rev}) in both modes.

The result

τ_max grows dead-linearly in the trajectory length LL, with τ/L\tau / L essentially constant (0.16\approx 0.16) across six doublings:

| LL (sweeps) | warm | τmax\tau_{max} | τ/L\tau/L | self-consistent (L5τL \ge 5\tau) | |---|---|---|---|---| | 1,000 | 200 | 166.7 | 0.167 | yes | | 2,000 | 833 | 333.2 | 0.167 | yes | | 4,000 | 1,666 | 662.6 | 0.166 | yes | | 8,000 | 3,313 | 1,325 | 0.166 | yes | | 16,000 | 6,627 | 2,431 | 0.152 | yes | | 32,000 | 12,153 | 5,280 | 0.165 | yes |

A constant τ/L\tau / L means the integrated τint\tau_{int} accumulates as fast as data is added — the autocorrelation function has not decayed within the window. This is a near-zero spectral gap γeff0\gamma_{eff} \to 0 / effectively non-equilibrating chain. The doubling-stability criterion is never met (each doubling roughly doubles τ\tau), so the rule correctly refuses to resolve τ^\hat{\tau}. Warm-train ran 200/200 epochs at 11.623 s/epoch, confirming both the at-scale cost (6.5\approx 6.5 h for a 2000-epoch run) and the v2 fix.

Consequence: the 50τ^\gtrsim 50\cdot\hat{\tau} averaging windows are uninstantiable and the A6/KτintA6 / K \gg \tau_{int} premise is unreachable for this kernel at this checkpoint. Registered outcome (F): P0-HALT.

Scope and caveats

This does show, robustly, that the A2A2-required reversible kernel does not equilibrate at accessible scale (measured curve to L=32,000L=32{,}000; further doublings could only add more growing-τ\tau points). It does not by itself distinguish two readings, both giving the same operational verdict: (1) a genuine near-zero gap — the trained conditional is multimodal and the chain cannot cross basins (γeff0\gamma_{eff}\to 0, the very plateau regime the theorem is about); or (2) inadequate burn-in — true τ\tau \gg warm-up, but if true τ\tau is unbounded no feasible burn-in helps.

The exp3 comparison is confounded: exp4 changed both the kernel (→ reversible) and the window length, so the jump from τ486\tau\approx 486500500 to τ5,280\tau \ge 5{,}280 cannot be cleanly attributed. The qualitative tell — exp3's τ\tau was stable across tt while exp4's grows L\propto L with no leveling — favors genuinely slower mixing over mere truncation. The deeper tension worth flagging: exp3's faster-mixing kernel violated A2A2; the A2A2-valid kernel is the slow one.

Honesty: the Studio was cut off (credit exhaustion) after L=32,000L=32{,}000, before p0_calibrate.json and the A7-spectrum feasibility probe were written — so P0 is sufficient to fire the τ^\hat{\tau}-UNRESOLVED HALT but the A7 measurement is unrun. P1–P5 did not run (no recorded DECISION: PROCEED); nothing is reported as measured for them.

No tag flip. The conditional factorization (A1A1A8A8 + plateau + F4 QopQstruct\Rightarrow Q_{op}\approx Q_{struct}^{\perp}) stays [solid] (untouched — this is an operational test); the operational/unconditional claim stays [conjectured], now for the deeper reason that the chain does not equilibrate at accessible scale. This sharpens Risk 5 (the A6A6 gate) and Risk 1 (at-scale tracking), leaving both [open].


What this feeds: the natural next investigation is distinguishing near-zero-gap from inadequate-burn-in (ACF shape, or an earlier/less-trained checkpoint with finite τ\tau) and reaching the credit-gated A7-spectrum feasibility probe — both deferred.

— fin. —