We took the structural trainability predictor, made it the aux term of a loss, and trained against it — to see whether is not just measurable but usable; the honest answer here is "computable and differentiable, yes; steering, no."
The question
The companion predictor work (experiments/exp1-exact-diag/) established a differentiable, observable-projected predictor. The escalation: if it is differentiable in the couplings , does adding it to the objective push a model toward higher trainability? This is the first in-loop test — exp8 only evaluated the predictor at isolated anchors. The deeper warning behind it is the flagship thesis that you can't train your way to trainability; exp9 puts that claim to a number.
The setup
Loss with an auxiliary trainability term:
Three arms (baseline , primary , secondary ) across 4 training cells (, hidden ), Adam, step cap — 12 runs total. The verdict arm is the primary . The pooled uses Change A, an -RHS factored resolvent ran every Adam step of the 8 augmented runs (16,000 in-loop gradient-bearing evaluations). Everything was frozen pre-run: constants, seed table, the -only matched-crossing verdict basis, pair rules, and the steering bar . Run: 1913 s on laptop CPU, JAX 0.9.1 (x64), all under the declared 4 h cap.
The result
Registered Outcome 3 fired: P1 PASS + P4 verified + P3 PASS + P2 FAIL. The objective is feasible but does not steer at the verdict .
- P1 — feasibility + fidelity: PASS (construction/formula-level). All 12 runs finite at every step; value-agreement worst case over 24 comparisons (gate ); autodiff-vs-FD worst best- rel-err per cell (gate ), PASS .
- P2 — steering: FAIL. Paired ratios at equal KL progress over 8 pairs gave median (the median leg failed). The consistency leg passed alone: count(ratio ) . So the objective moves in the intended direction almost everywhere, but the matched-progress effect size (median ) is far below the demo bar.
- P3 — KL-compatibility guard: PASS. Every arm crossed KL within the cap (primary ); augmented arms crossed earlier than baseline yet stalled at higher final — they trade terminal convergence, not early progress.
- P4 — R4 carry-over: verified over 192 diagnostic cells; max detailed-balance residual , Cheeger margin , zero sym-check failures; the armed HALT never fired.
Decomposition (D11.i): the gains are signal-side dominated — numerator ratios span –, denominator (noise) ratios only –. Even the one large pair ((5,2)·c25, ratio ) is num den . Per the pre-commitment, a signal-side-driven gain is the anti-convergence-flavored channel — the weaker steering mode. The mixing/estimability channel ( down) was essentially unmoved.
Dose-response (D11.iv): the secondary arm is a clean null (median , ). Monotone in : , , .
Scope and caveats
This is a predictor result, not an estimator result: exp9 moved and measured exactly — no sampling, no ran. Nothing here bears on whether high implies high actual SNR; the operational factorization tier stays [conjectured], gates unchanged. No tag moves; G2 untouched.
The scope is demo-level: 4 cells, , one seed table, 4 teachers at , two points. "Does not steer here" is scoped to this family/architecture/teacher set/ pair — never fundamentality.
The headline trap is the finals. At final checkpoints the primary arm holds pooled at – while baselines collapse to – — an apparent "gain" at , up to at . That number is pure anti-convergence: the aux term freezes high, keeping large. D5 excludes finals from the verdict by design; the matched-crossing basis removed exactly this confound, leaving the honest .
What this feeds: the dose-response monotone trend (, consistency ) points at a larger- probe — but that is a new design decision for the researcher under a fresh pre-commitment, not a continuation of this frozen one-shot set. Outcome 3 returns the design to the researcher; the conferred annotations (scan G1 cell, spine Risk-2 sub-bullet, HTDML-property parenthetical) record this as a demo-level negative.