Exp 6 — Checkpoint τ-Sweep: Slow From the Start · Thermodynamic Machine Learning

If the reversible kernel cannot mix at convergence, the natural next question is whether it ever could — so we walked the same trained trajectory backward in time and measured $\hat\tau$ at four checkpoints.

The question

Exp 4 established that on a converged 60_12 DTM the reversible negative-phase kernel does not equilibrate: $\tau \propto L$ out to large lattice depth, so the integrated autocorrelation time $\tau_{int}$ required by assumption $A6$ is never finite. The open follow-up: is non-equilibration a property of the converged model only — a deep-checkpoint pathology that earlier, less-trained models might escape — or is it fundamental on this substrate, present from the start? That distinguishes the two horns of the spine's "fundamentality open" question.

The setup

One cumulative training trajectory (single-input 60_12 DTM-MNIST, seed 0, N_CHAINS=32, vanilla ACP-off, reversible patch live throughout), probed at four checkpoints $t \in \{25, 50, 100, 200\}$ epochs. Each probe runs exp4's non-circular doubling-stability $\hat\tau$ rule (half-Sokal $\tau_{int}$ , TAU_TOL=0.15, SOKAL_C=5) under a per-checkpoint ceiling P0_CKPT_CEILING_H=0.75 GPU-h. Measurement-only: the harness writes sweep_calibrate.json and never a budget or verdict. Substrate Lightning H200; run COMPLETE in 5.825 h. See experiments/exp6-checkpoint-tau-sweep/.

Two provenance invariants are PROVEN, not asserted:

Cumulative trajectory (cumulative_training_proven=True): opt_count increments $1525 \to 3050 \to 6100 \to 12200$ ( $= t \times 61$ batches), monotone, weights-hash distinct per checkpoint, LR advances $0.030$ (t=25) $\to 0.010$ floor (t≥50), never re-ramped.
Probe-RNG isolation (probe_rng_isolated_proven=True): the key_timeline shows dtm.key identical before/after every probe, advancing only across train chunks. The interleaved probe consumed a probe-local jr.PRNGKey(SEED) chain and never touched the training stream — which is what makes the ladder bit-equivalent to a single train(200) and the t=200 anchor interpretable.

The result

$\hat\tau$ does not equilibrate at accessible scale at any checkpoint. Three of four grow $\tau \propto L$ out to $L = 64{,}000$ with no resolution:

| $t$ | LR | $\hat\tau$ (frozen rule) | curve | |---|---|---|---| | 25 | 0.030 | 2094 (stabilized) | $\tau/L\approx 0.16$ to $L=4\text{k}$ , softening (0.127 at 16k), single flat step $2029\to2094$ at $L=32\text{k}$ ; probe stopped at $L=32\text{k}$ | | 50 | 0.010 | UNRESOLVED | $\tau\propto L$ to 64k, $\tau/L \approx 0.136$ , $\tau=8679$ | | 100 | 0.010 | UNRESOLVED | $\tau\propto L$ to 64k, $\tau/L \approx 0.16$ , $\tau=10207$ | | 200 | 0.010 | UNRESOLVED | $\tau\propto L$ to 64k, $\tau/L \approx 0.167$ , $\tau=10666$ |

For t=50/100/200 the proximate stop was the 0.75 h ceiling firing (tau_unresolved_reason="ckpt_ceiling"); the doubling rule had not resolved anyway (e.g. t=50 $L{=}32\text{k}\to64\text{k}$ reldiff $\approx 0.73$ , far above TAU_TOL). t=200 reproduces exp4's doubling probe ( $\tau/L\approx0.16$ to $L=32\text{k}$ ; exp4 $\tau=5280$ , exp6 $\tau=5013$ there — within noise): the reproduction anchor is satisfied.

The conservative scientific reading is registered outcome (i), "slow from the start" — $\tau \propto L$ at all four checkpoints. The $A2\leftrightarrow A6$ operational antagonism is present from very early training (t=25, pre-LR-floor), not only at convergence.

Scope and caveats

This is where the entry earns its keep. The frozen rule's literal output is not outcome (i) — it is outcome (ii) (crossover): by the letter of the rule, t=25 resolved to $\hat\tau = 2094.37$ (reldiff $0.032 < 0.15$ and $L \ge 5\hat\tau$ both hold at $L=32\text{k}$ ). So outcome (i)'s own literal precondition — "UNRESOLVED at all core checkpoints incl. t=25" — is false per the rule. We reach (i) only after a claim-precision audit reclassifies t=25's resolution as a windowing/finite-sample artifact:

(a) t=25 grew $\tau\propto L$ like the others through $L=8\text{k}$ , then "stabilized" on a single sub-tolerance step at $L=32\text{k}$ ( $\tau/L$ collapsing to 0.065) — the very range where t=50/100/200 are still climbing;
(b) t=25's probe stopped at $L=32\text{k}$ , one doubling shorter than the others, so we never saw whether it would resume $\propto L$ growth at $L=64\text{k}$ as the others did.

Those two are the decisive caveats. Conservative verdict: t=25 is a crossover candidate, not a confirmed finite- $\tau$ regime. No threshold was relaxed; the rule's output is reported verbatim alongside the audit.

Even taking t=25 at face value, it fails the A6-vs-utility test: the only rule-resolved checkpoint is the least-trained model, and its Phase-2 $50\cdot\hat\tau$ window would need $\approx 1.05\times10^{5}$ steps. A finite- $\tau$ reversible regime, if it exists here at all, coincides with a barely-useful model — the antagonism is not escaped at any useful checkpoint.

Phase 2 did not run. The gate needs three conjunctive conditions. Condition 2 (MEM_SAFE $\approx 2.9\times10^{5}$ steps $> 50\hat\tau$ ) would actually pass — the $L\le64\text{k}$ reach was a wall-clock-ceiling artifact, not a memory limit. The dispositive failure is condition 1 (no robust finite $\hat\tau$ , per the audit); condition 3 (no declared DECISION: PROCEED) is a procedural backstop. checkpoint_decision.md stays PENDING.

No tag flip. Conditional factorization stays [solid]; the operational claim stays [conjectured], now with at-scale real-chain evidence. This sharpens Risk 5 (the $A6$ gate) and Risk 1 (at-scale tracking) — both stay [open] — and strengthens, does not close, the named $A2\leftrightarrow A6$ structural-obstruction reading toward the fundamental-on-this-substrate side. Single graph, single trajectory, one unconfirmed t=25 candidate: the spine's "fundamentality open" stands.

A budget honesty note: the pre-commitment self-bounded at $\le 3.65$ GPU-h; actual was 5.825 h. Cause is per-doubling JIT/re-trace overhead dominating the (tiny, $\sim 85$ s total) sampling sweeps, letting one extra doubling complete past the ceiling check. No constant was relaxed, and the extra length only gave the chains longer to equilibrate — which they still didn't, strengthening the UNRESOLVED verdicts.

What this feeds: a confirm/refute of the t=25 crossover candidate needs a longer t=25 probe ( $\ge L=64\text{k}$ ) and/or more chains — registered as a follow-up, not claimed here; this entry leaves the $A2\leftrightarrow A6$ obstruction pushed toward, but not settled as, fundamental-on-this-substrate.