Thermodynamic Machine Learning · MMXXVI
Experiment4.VI.MMXXVIRead 5 min

Exp 6 — Checkpoint τ-Sweep: Slow From the Start

Entry 8

If the reversible kernel cannot mix at convergence, the natural next question is whether it ever could — so we walked the same trained trajectory backward in time and measured τ^\hat\tau at four checkpoints.

The question

Exp 4 established that on a converged 60_12 DTM the reversible negative-phase kernel does not equilibrate: τL\tau \propto L out to large lattice depth, so the integrated autocorrelation time τint\tau_{int} required by assumption A6A6 is never finite. The open follow-up: is non-equilibration a property of the converged model only — a deep-checkpoint pathology that earlier, less-trained models might escape — or is it fundamental on this substrate, present from the start? That distinguishes the two horns of the spine's "fundamentality open" question.

The setup

One cumulative training trajectory (single-input 60_12 DTM-MNIST, seed 0, N_CHAINS=32, vanilla ACP-off, reversible patch live throughout), probed at four checkpoints t{25,50,100,200}t \in \{25, 50, 100, 200\} epochs. Each probe runs exp4's non-circular doubling-stability τ^\hat\tau rule (half-Sokal τint\tau_{int}, TAU_TOL=0.15, SOKAL_C=5) under a per-checkpoint ceiling P0_CKPT_CEILING_H=0.75 GPU-h. Measurement-only: the harness writes sweep_calibrate.json and never a budget or verdict. Substrate Lightning H200; run COMPLETE in 5.825 h. See experiments/exp6-checkpoint-tau-sweep/.

Two provenance invariants are PROVEN, not asserted:

  • Cumulative trajectory (cumulative_training_proven=True): opt_count increments 152530506100122001525 \to 3050 \to 6100 \to 12200 (=t×61= t \times 61 batches), monotone, weights-hash distinct per checkpoint, LR advances 0.0300.030 (t=25) 0.010\to 0.010 floor (t≥50), never re-ramped.
  • Probe-RNG isolation (probe_rng_isolated_proven=True): the key_timeline shows dtm.key identical before/after every probe, advancing only across train chunks. The interleaved probe consumed a probe-local jr.PRNGKey(SEED) chain and never touched the training stream — which is what makes the ladder bit-equivalent to a single train(200) and the t=200 anchor interpretable.

The result

τ^\hat\tau does not equilibrate at accessible scale at any checkpoint. Three of four grow τL\tau \propto L out to L=64,000L = 64{,}000 with no resolution:

| tt | LR | τ^\hat\tau (frozen rule) | curve | |---|---|---|---| | 25 | 0.030 | 2094 (stabilized) | τ/L0.16\tau/L\approx 0.16 to L=4kL=4\text{k}, softening (0.127 at 16k), single flat step 202920942029\to2094 at L=32kL=32\text{k}; probe stopped at L=32kL=32\text{k} | | 50 | 0.010 | UNRESOLVED | τL\tau\propto L to 64k, τ/L0.136\tau/L \approx 0.136, τ=8679\tau=8679 | | 100 | 0.010 | UNRESOLVED | τL\tau\propto L to 64k, τ/L0.16\tau/L \approx 0.16, τ=10207\tau=10207 | | 200 | 0.010 | UNRESOLVED | τL\tau\propto L to 64k, τ/L0.167\tau/L \approx 0.167, τ=10666\tau=10666 |

For t=50/100/200 the proximate stop was the 0.75 h ceiling firing (tau_unresolved_reason="ckpt_ceiling"); the doubling rule had not resolved anyway (e.g. t=50 L=32k64kL{=}32\text{k}\to64\text{k} reldiff 0.73\approx 0.73, far above TAU_TOL). t=200 reproduces exp4's doubling probe (τ/L0.16\tau/L\approx0.16 to L=32kL=32\text{k}; exp4 τ=5280\tau=5280, exp6 τ=5013\tau=5013 there — within noise): the reproduction anchor is satisfied.

The conservative scientific reading is registered outcome (i), "slow from the start"τL\tau \propto L at all four checkpoints. The A2A6A2\leftrightarrow A6 operational antagonism is present from very early training (t=25, pre-LR-floor), not only at convergence.

Scope and caveats

This is where the entry earns its keep. The frozen rule's literal output is not outcome (i) — it is outcome (ii) (crossover): by the letter of the rule, t=25 resolved to τ^=2094.37\hat\tau = 2094.37 (reldiff 0.032<0.150.032 < 0.15 and L5τ^L \ge 5\hat\tau both hold at L=32kL=32\text{k}). So outcome (i)'s own literal precondition — "UNRESOLVED at all core checkpoints incl. t=25" — is false per the rule. We reach (i) only after a claim-precision audit reclassifies t=25's resolution as a windowing/finite-sample artifact:

  • (a) t=25 grew τL\tau\propto L like the others through L=8kL=8\text{k}, then "stabilized" on a single sub-tolerance step at L=32kL=32\text{k} (τ/L\tau/L collapsing to 0.065) — the very range where t=50/100/200 are still climbing;
  • (b) t=25's probe stopped at L=32kL=32\text{k}, one doubling shorter than the others, so we never saw whether it would resume L\propto L growth at L=64kL=64\text{k} as the others did.

Those two are the decisive caveats. Conservative verdict: t=25 is a crossover candidate, not a confirmed finite-τ\tau regime. No threshold was relaxed; the rule's output is reported verbatim alongside the audit.

Even taking t=25 at face value, it fails the A6-vs-utility test: the only rule-resolved checkpoint is the least-trained model, and its Phase-2 50τ^50\cdot\hat\tau window would need 1.05×105\approx 1.05\times10^{5} steps. A finite-τ\tau reversible regime, if it exists here at all, coincides with a barely-useful model — the antagonism is not escaped at any useful checkpoint.

Phase 2 did not run. The gate needs three conjunctive conditions. Condition 2 (MEM_SAFE 2.9×105\approx 2.9\times10^{5} steps >50τ^> 50\hat\tau) would actually pass — the L64kL\le64\text{k} reach was a wall-clock-ceiling artifact, not a memory limit. The dispositive failure is condition 1 (no robust finite τ^\hat\tau, per the audit); condition 3 (no declared DECISION: PROCEED) is a procedural backstop. checkpoint_decision.md stays PENDING.

No tag flip. Conditional factorization stays [solid]; the operational claim stays [conjectured], now with at-scale real-chain evidence. This sharpens Risk 5 (the A6A6 gate) and Risk 1 (at-scale tracking) — both stay [open] — and strengthens, does not close, the named A2A6A2\leftrightarrow A6 structural-obstruction reading toward the fundamental-on-this-substrate side. Single graph, single trajectory, one unconfirmed t=25 candidate: the spine's "fundamentality open" stands.

A budget honesty note: the pre-commitment self-bounded at 3.65\le 3.65 GPU-h; actual was 5.825 h. Cause is per-doubling JIT/re-trace overhead dominating the (tiny, 85\sim 85 s total) sampling sweeps, letting one extra doubling complete past the ceiling check. No constant was relaxed, and the extra length only gave the chains longer to equilibrate — which they still didn't, strengthening the UNRESOLVED verdicts.


What this feeds: a confirm/refute of the t=25 crossover candidate needs a longer t=25 probe (L=64k\ge L=64\text{k}) and/or more chains — registered as a follow-up, not claimed here; this entry leaves the A2A6A2\leftrightarrow A6 obstruction pushed toward, but not settled as, fundamental-on-this-substrate.

— fin. —