Thermodynamic Machine Learning · MMXXVI
Experiment27.V.MMXXVIRead 4 min

Exp 1 — Exact Diagonalization: Risk-1 Fires

Entry 2

The experiment built to decide Risk 1 returned a verdict the pre-commitment did not anticipate: the single-γ\gamma factorization fails for a structural reason before it fails for a cluster reason.

The question

Risk 1 of the trainability theorem asks whether the conjectured gradient-SNR factorization Q(γK/2)RQ \asymp (\gamma K / 2)\, R survives exact computation. The structural ratio is

R=g2awaVarπ[fa],wa=f^a,22Varπ[fa],R = \frac{\lVert g \rVert^2}{\sum_a w_a \operatorname{Var}_\pi[f_a]}, \qquad w_a = \frac{\hat f_{a,2}^2}{\operatorname{Var}_\pi[f_a]},

anchored to the slowest Gibbs mode ϕ2\phi_2 (eigenvalue σ2\sigma_2). The pre-committed question: in isolated spectra does QstructQ_{struct} track the operational SNR QopQ_{op}, and in cluster spectra does single-γ\gamma over-predict? See experiments/exp1-exact-diag/.

The setup

Frozen pre-registered run (experiment.py, 80 cells, 264 s, pure numpy/scipy on a laptop) plus an exploratory follow-up (followup.py, 96 cells). Families: Curie–Weiss, SK, and planted Hopfield at MM patterns, swept over β×δ×K\beta \times \delta \times K with N14N \le 14. Constants per the pre-commitment: τ=3\tau = 3, c=3c = 3, m=3m = 3, burn-in B=5/γB = \lceil 5/\gamma \rceil, seed 00. The kernel is a reversible single-site random-scan Gibbs sampler; we diagonalize it exactly to get σ2\sigma_2, the slow manifold, and the exact QopQ_{op}.

Validity checks all pass. Detailed balance πxPxy=πyPyx\pi_x P_{xy} = \pi_y P_{yx} holds to residual 5×1018\le 5\times10^{-18}; stationarity πP=π\pi P = \pi to 1016\le 10^{-16}; σ1=1\sigma_1 = 1 asserted every cell. The headline-method check — exact bias+window MSE against a genuinely non-stationary MC chain (uniform start, real burn-in, 400400 seeds) — matches within 0.70.73.3%3.3\% across 5 cells, so the stationary-window approximation is sound.

The result

Risk 1 is confirmed and sharpened — not closed. Two findings.

1. Symmetry mis-anchoring (the discovery). With field b=0b = 0 these EBMs carry Z2Z_2 spin-flip symmetry. The slowest mode ϕ2\phi_2 is odd; the pairwise gradient observables fij=xixjf_{ij} = -x_i x_j are even. So f^a,220\hat f_{a,2}^2 \approx 0 — confirmed numerically at the 102410^{-24} to 102910^{-29} floor — and the single-γ\gamma ratio RR blows up. QstructQ_{struct} over-predicts QopQ_{op} by 102610^{26} to 103010^{30}: a divide-by-symmetry-zero, not a finite error. The gradient SNR is set by the slowest observable-overlapping (even) mode, not by σ2\sigma_2.

2. Cluster in the observable-relevant sector. Once anchored to the modes the gradient actually overlaps, multimodal Hopfield cells show a genuine cluster of slow even modes. A single relevant gap is insufficient (tracks QopQ_{op} in 23/4823/48 cells); the multi-mode cluster correction restores tracking (45/4845/48). Both fixes are necessary at b=0b = 0.

The pre-registered predicate verdicts: P1 (isolated regime tracks) is NULL — no dense cell anywhere had an isolated σ2\sigma_2, so the premise never occurs and P1 is not testable as framed. P2 (cluster size monotone in MM) FAILS as a law — raw C3|C_3| is non-monotone (e.g. N=8,β=2N=8,\beta=2: M=14[1,3,1,1]M=1\ldots4 \to [1,3,1,1]). P3 (single-γ\gamma over-predicts in cluster cells) passes literally (31/3131/31) but degenerately — the over-prediction is the 102610^{26}103010^{30} blowup, right direction, wrong mechanism. P4 (γeff+RC\gamma_{eff} + R^C repair) is PARTIAL (13/3113/31), because its cluster set was anchored to the symmetry-odd σ2\sigma_2.

The clean finite Risk-1 mechanism does appear: in the symmetry-broken b0b \ne 0 run the naive predictor becomes well-defined and over-predicts by a finite 8\sim 89×9\times in cluster cells, tracking (0.5×\sim 0.5\times) in isolated cells.

Scope and caveats

This is construction-confirmed, not validated — no tag flip. The observable-relevant predictor and the b0b \ne 0 run are post-hoc/exploratory, defined after seeing the degeneracy; they upgrade no artifact tag. The single-site random-scan kernel carries a weak-coupling degenerate manifold at σ=11/N\sigma = 1 - 1/N unrelated to the energy landscape — DTM uses block-Gibbs, which exp2 must check. Small N14N \le 14, controlled planted/random families: this reveals the mechanism, not asymptotic prevalence in trained DTMs. exp1 identifies a necessary correction to the factorization; it does not prove the corrected form.

What this feeds

Risk 1 in the trainability theorem moves [open] → [open — sharpened]: the factorization requires (a) an observable projection (the relevant gap is to the slowest mode gg overlaps, not σ2\sigma_2) and (b) a γeff\gamma_{eff}/multi-mode cluster correction. The conjectured Q(γK/2)RQ \asymp (\gamma K/2)R stays conjectured. The [solid] variance leg's 2wa2w_a constant is annotated: it presumes a single observable-relevant slow mode, now known insufficient and, under symmetry, mis-anchored.


What this feeds: exp2 tests whether the degenerate manifold and the symmetry picture survive a block-Gibbs (THRML) update scheme.

— fin. —