Thermodynamic Machine Learning · MMXXVI
Negative Result20.VI.MMXXVIRead 5 min

What Didn’t Work — and Why the Negative Results Matter

Entry 23

A synthesis of the program's honest negatives: the operational trainability claim is gated not by a budget shortfall but by a structural obstruction, and each failure is sharp enough to be a result.

The Q-program proves a conditional factorization QopQstructQ_{op} \approx Q_{struct}^{\perp} in regime A1–A8 + plateau + F4 ([solid], written proof O1–O6 over the single proven-here lemma O1.c). The operational claim — that this holds on a real DTM at the KK one actually runs — stays [conjectured], gated by A2 (π\pi-reversibility, which the no-leakage machinery needs) and A6 (KτintK \gg \tau_{int}). What follows is the record of trying to meet those gates.

The A2 ↔ A6 obstruction

The question. Can a kernel be both reversible (A2) and fast-mixing enough that KτintK \gg \tau_{int} (A6)? Nothing formally forbids it — but the factorization is about the multimodal plateau, exactly where the two become operationally antagonistic.

The setup. experiments/exp4-reversible-50tau/ fixed exp3's blockers: a genuinely reversible symmetrized 4-block Gibbs kernel M=12(Pfwd+Prev)M = \tfrac12(P_{fwd}+P_{rev}) (self-adjointness gated to 1017\sim 10^{-17}), then a non-circular doubling-stability probe for τ^\hat\tau on the real 60_12 MNIST DTM at t=200t=200 (Lightning H200).

The result. τmax\tau_{max} does not stabilize — it grows dead-linearly in trajectory length LL: 166.75,280166.7 \to 5{,}280 over L=1,00032,000L = 1{,}000 \to 32{,}000, τ/L0.16\tau/L \approx 0.16 across six doublings, and the doubling-stability rule (τ(2L)τ(L)/τ(L)<0.15|\tau(2L)-\tau(L)|/\tau(L) < 0.15) is never met. So τ^\hat\tau is UNRESOLVED, the 50τ\gtrsim 50\tau windows are uninstantiable, and A6 is unreachable — registered outcome (F): P0-HALT, P1–P5 unrun. exp6 reproduced τL\tau \propto L at every checkpoint t{25,50,100,200}t \in \{25,50,100,200\}, pushing the reading from "deep-checkpoint effect" toward "slow from the start."

Scope and caveats. The exp3→exp4 jump is confounded (kernel and window both changed), so τ5,280\tau \ge 5{,}280 is not pure truncation — but the clean attribution is to the reversible kernel mixing far slower. The deeper tension: exp3's faster kernel violated A2; the A2-valid kernel is the slow one. τL\tau \propto L reads either as a genuine near-zero gap or as inadequate burn-in, but both give the same operational conclusion. No tag flip: A6 is unreachable here, not the conjecture false.

The thermodynamic-length wall

The question. If a single reversible kernel mixes too slowly, can a reversible parallel-tempering ladder bridge a hot, free-mixing regime down to the cold target — cheaply?

The setup. experiments/exp19-hotter-top-first/ ran an equal-acceptance reversible-PT mixing probe on the trained 60_12 DTM (single-input conditional, INPUT_IDX=0), with a hot-top sweep (αtop\alpha_{top} floored at 0.01) and an analytic rung-count R(αtop)=1+round(βαtop1 ⁣Cdα/δ)R^*(\alpha_{top}) = 1 + \text{round}(\beta \int_{\alpha_{top}}^{1}\!\sqrt{C}\,d\alpha / \delta^*), δ=1.683\delta^* = 1.683. The trained-weight-refresh guard (refresh_ok=Truerefresh\_ok=True) passed, fixing the bug that invalidated the earlier exp15/16 reads.

The result. A hotter top does cure decorrelation — the single replica reaches τ25\tau \le 25 at αtop=0.02\alpha_{top}=0.02 (τ=15.8\tau = 15.8; at α=0.01\alpha=0.01, τ=2.2\tau=2.2): a temperature effect, not local-kernel non-ergodicity. But the cheapest decorrelating span [0.02,1.0][0.02, 1.0] needs R=136R^* = 136 rungs Rmax=96\gg R_{max} = 96. The two frontiers do not overlap — decorrelation needs αtop0.02\alpha_{top} \le 0.02, tractability needs αtop0.25\alpha_{top} \ge {\sim}0.25 (R(0.25)=94R^*(0.25)=94). The gap is a thermodynamic-length wall of β ⁣ ⁣C227\beta\!\int\!\sqrt{C} \approx 227 — decisive STOP at Stage A.

Scope and caveats. Config-scoped (60_12, seed 0, t=200t=200, this kernel and ladder family) — never a fundamentality verdict about reversible PT. RR^* is a Gaussian-overlap lower bound, so the true cost can only be higher. exp18 had already shown PT fails by schedule, not by overlap: a uniform-Δα\Delta\alpha ladder over [0.5,1.0][0.5,1.0] missed the acceptance band not because spacing can't move acceptance (it moves it ×33\times 33) but because the DTM's specific heat is non-uniform — the hot edge clears the floor only after the cold edges overshoot the ceiling (PT-MARGINAL). exp19 then resolved both of exp18's suspects and the ladder is still intractable. MEASURE-ONLY; moves no tag.

You cannot optimize the proxy directly

The question. QstructQ_{struct}^{\perp} is computable without training to convergence and differentiable in JJ. Can you therefore use it as an in-loop objective (HTDML) to steer a model toward trainability?

The setup. experiments/exp11-htdml-objective-lambda-sweep/ swept the objective weight λ{0,0.1,0.3,1.0,3.0}\lambda \in \{0, 0.1, 0.3, 1.0, 3.0\} over 4 cells, with a matched-/held-crossing design and a KL-compatibility task guard.

The result. The computability + differentiability-in-use half is PASS — up to 32,00032{,}000 in-loop pooled-QQ evaluations with gradients, agreeing with the unmodified numpy spectral pipeline to 2.26×10142.26\times10^{-14} (P1). But the steering half does not steer (registered Outcome 2). As λ\lambda rises the matched-crossing ratio climbs (1.1252.74101.41.125 \to 2.74 \to 101.4 \to undefined) while the task-guard pass fraction collapses (4/42/41/40/44/4 \to 2/4 \to 1/4 \to 0/4). The only verdict-eligible arm, λ=0.1\lambda=0.1, fails Leg 1 (median 1.125<1.51.125 < 1.5, reproducing exp9 exactly) and its genuine-channel gate (GγG\gamma median 1.010>0.901.010 > 0.90 — the γeff\gamma_{eff} channel never engages). The large ratios at λ0.3\lambda \ge 0.3 are non-verdict quantities: they buy QQ by anti-convergence that abandons the KL task. This is Goodhart — push the proxy past the bar and you destroy what it proxies for.

Scope and caveats. Demo-level scope (N10N \le 10, one seed table, this λ\lambda ladder) — a scoped negative, never "cannot steer." It is a predictor measurement, not a QopQ_{op} validation: no sampling ran. No tag moves; G2 untouched.


What this feeds: the two pre-registered pivots from exp19 — switch sampler families (hierarchical PT / simulated tempering / population annealing, at the risk of breaking A2 and trading a mixing problem for a theorem problem) or the theory route of a τint\tau_{int}-robust QopQ_{op} estimator that validates the operational claim without a fast-mixing kernel. In this field, a sharp, well-instrumented negative — A6 unreachable, the R=136>96R^*=136>96 wall, Goodhart on the proxy — is the contribution.

— fin. —