Exp 9 — The Q Objective Demo: It Doesn’t Steer · Thermodynamic Machine Learning

We took the structural trainability predictor, made it the aux term of a loss, and trained against it — to see whether $Q_{struct}^{\perp}$ is not just measurable but usable; the honest answer here is "computable and differentiable, yes; steering, no."

The question

The companion predictor work (experiments/exp1-exact-diag/) established a differentiable, observable-projected predictor. The escalation: if it is differentiable in the couplings $J$ , does adding it to the objective push a model toward higher trainability? This is the first in-loop test — exp8 only evaluated the predictor at isolated anchors. The deeper warning behind it is the flagship thesis that you can't train your way to trainability; exp9 puts that claim to a number.

The setup

Loss with an auxiliary trainability term:

$L = L_{KL} + \lambda\,\bigl(-\log Q_{pool}\bigr),\qquad \lambda \in \{0,\,0.1,\,0.01\}.$

Three arms (baseline $\lambda{=}0$ , primary $\lambda{=}0.1$ , secondary $\lambda{=}0.01$ ) across 4 training cells ( $m\in\{4,5\}$ , hidden $\in\{2,3\}$ ), Adam, step cap $2000$ — 12 runs total. The verdict arm is the primary $\lambda{=}0.1$ . The pooled $Q$ uses Change A, an $m^2$ -RHS factored resolvent ran every Adam step of the 8 augmented runs (16,000 in-loop gradient-bearing $Q$ evaluations). Everything was frozen pre-run: constants, seed table, the $c50/c25$ -only matched-crossing verdict basis, pair rules, and the steering bar $\rho = 1.5$ . Run: 1913 s on laptop CPU, JAX 0.9.1 (x64), all under the declared 4 h cap.

The result

Registered Outcome 3 fired: P1 PASS + P4 verified + P3 PASS + P2 FAIL. The objective is feasible but does not steer at the verdict $\lambda$ .

P1 — feasibility + fidelity: PASS (construction/formula-level). All 12 runs finite at every step; value-agreement worst case $|Q_{JAX}-Q_{numpy}|/|Q_{numpy}| = 2.26\times10^{-14}$ over 24 comparisons (gate $10^{-10}$ ); autodiff-vs-FD worst best- $h$ rel-err per cell $\{2.51,2.53,1.07,1.96\}\times10^{-9}$ (gate $10^{-4}$ ), PASS $\times 4$ .
P2 — steering: FAIL. Paired ratios $Q^{aug}/Q^{base}$ at equal KL progress over 8 pairs gave median $1.125 < \rho = 1.5$ (the median leg failed). The consistency leg passed alone: count(ratio $>1$ ) $= 7/8 \geq 6$ . So the objective moves $Q_{struct}^{\perp}$ in the intended direction almost everywhere, but the matched-progress effect size (median $+12.5\%$ ) is far below the $+50\%$ demo bar.
P3 — KL-compatibility guard: PASS. Every arm crossed $\leq 0.25$ KL within the cap (primary $4/4$ ); augmented arms crossed earlier than baseline yet stalled at higher final $L_{KL}$ — they trade terminal convergence, not early progress.
P4 — R4 carry-over: verified over 192 diagnostic cells; max detailed-balance residual $4.16\times10^{-17}$ , Cheeger margin $+3.68\times10^{-3}$ , zero sym-check failures; the armed HALT never fired.

Decomposition (D11.i): the gains are signal-side dominated — numerator ratios span $0.96$ – $2.05$ , denominator (noise) ratios only $0.87$ – $1.04$ . Even the one large pair ((5,2)·c25, ratio $2.37$ ) is num $2.05 \times$ den $^{-1}$ $1.15$ . Per the pre-commitment, a signal-side-driven gain is the anti-convergence-flavored channel — the weaker steering mode. The mixing/estimability channel ( $T_O$ down) was essentially unmoved.

Dose-response (D11.iv): the secondary $\lambda{=}0.01$ arm is a clean null (median $1.0002$ , $4/8>1$ ). Monotone in $\lambda$ : $0\to1$ , $0.01\to{\sim}1.00$ , $0.1\to{\sim}1.13$ .

Scope and caveats

This is a predictor result, not an estimator result: exp9 moved and measured $Q_{struct}^{\perp}$ exactly — no sampling, no $Q_{op}$ ran. Nothing here bears on whether high $Q_{struct}^{\perp}$ implies high actual SNR; the operational factorization tier stays [conjectured], gates unchanged. No tag moves; G2 untouched.

The scope is demo-level: 4 cells, $N \leq 10$ , one seed table, 4 teachers at $\beta_t = 3$ , two $\lambda$ points. "Does not steer here" is scoped to this family/architecture/teacher set/ $\lambda$ pair — never fundamentality.

The headline trap is the finals. At final checkpoints the primary arm holds pooled $Q$ at $O(10$ – $10^2)$ while baselines collapse to $O(10^{-8}$ – $10^{-5})$ — an apparent $\geq 5\times10^{6}$ "gain" at $(4,2)$ , up to ${\sim}10^{9}$ at $(5,3)$ . That number is pure anti-convergence: the aux term freezes $L_{KL}$ high, keeping $\lVert g\rVert$ large. D5 excludes finals from the verdict by design; the matched-crossing basis removed exactly this confound, leaving the honest $+12.5\%$ .

What this feeds: the dose-response monotone trend ( $1.00 \to 1.13$ , consistency $7/8$ ) points at a larger- $\lambda$ probe — but that is a new design decision for the researcher under a fresh pre-commitment, not a continuation of this frozen one-shot $\lambda$ set. Outcome 3 returns the design to the researcher; the conferred annotations (scan G1 cell, spine Risk-2 sub-bullet, HTDML-property parenthetical) record this as a demo-level negative.