Thermodynamic Machine Learning · MMXXVI
Experiment14.VI.MMXXVIRead 4 min

Exp 11 — Larger-λ Sweep: The Dose-Response Is the Result

Entry 13

Turning the regularizer knob harder does not buy steering — it buys a steeper trade between an inflating diagnostic and a collapsing task.

This is the complete technical record for experiments/exp11-htdml-objective-lambda-sweep/. Here we keep the numbers, the gates, and the claim-status discipline.

The question

exp9 showed that at λ=0.1\lambda = 0.1 the trainability objective L=LKL+λ(logQstruct)L = L_{KL} + \lambda\,(-\log Q_{struct}^{\perp}) under-steers. The obvious rejoinder: push λ\lambda higher. So exp11 asks two things at G1-level objective-usability (never G2): does a larger λ\lambda steer QstructQ_{struct}^{\perp} past the bar, and if it does, does it steer through the genuine γeff\gamma_{eff} channel — faster mixing shrinking the gradient-noise denominator — rather than through anti-convergence or a μC\mu_C-shrink artifact?

The setup

A frozen five-rung ladder λ{0,0.1,0.3,1.0,3.0}\lambda \in \{0, 0.1, 0.3, 1.0, 3.0\} over 4 training cells (N10N \le 10, 4 teachers at βt=3\beta_t = 3), 20 runs total, Adam step cap 2000. Pre-committed at gate-1 (d8073cc, before any implementation); runner frozen at gate-2 (dfd1bc6) after rehearsal and a 4-lens adversarial audit (0 MAJOR). The verdict basis is per-arm: an arm is verdict-eligible only if it passes the 3/4\ge 3/4 held-c25 task guard and has Npairs4N_{pairs} \ge 4. Steering needs Leg 1 (median matched-crossing pooled-QQ ratio ρ=1.5\ge \rho = 1.5) and Leg 2 (genuine-channel gate Gγ0.90G_\gamma \le 0.90 at δ=0.10\delta = 0.10). Ran 2026-06-14, laptop CPU, 2711 s wall; JAX 0.9.1 (x64). wandb ran offline (instrumentation-only, non-verdict-bearing).

The result

P1 — feasibility + fidelity: PASS (construction/formula-level, never "empirically validated"). All 20 runs finite (zero non-finite events); the value-agreement gate held across 26 comparisons at max QJAXQnumpy/Qnumpy=2.26×1014|Q_{JAX} - Q_{numpy}|/|Q_{numpy}| = 2.26\times10^{-14} (gate 101010^{-10}); the FD battery passed at all 4 cells. The factored resolvent ran in-loop across ~32 000 pooled-QQ evaluations with gradients — the objective is computable, differentiable, and in-loop-usable at every λ\lambda.

P2 — steering: Outcome 2, does NOT steer in this λ\lambda range (scoped negative). The matched-crossing pooled-QQ ratio rises steeply while the guard-pass fraction collapses:

| arm | λ\lambda | guard | eligible | NpairsN_{pairs} | median ratio | Leg 1 | Leg 2 (GγG_\gamma) | |---|---|---|---|---|---|---|---| | baseline | 0 | 4/4 | — | — | — | — | — | | lam0p1 | 0.1 | 4/4 | ✓ | 8 | 1.125 | FAIL (<1.5<1.5) | FAIL (1.010 >0.90> 0.90) | | lam0p3 | 0.3 | 2/4 | ✗ | 4 | 2.744 | — | — | | lam1p0 | 1.0 | 1/4 | ✗ | 2 | 101.37 | — | — | | lam3p0 | 3.0 | 0/4 | ✗ | 0 | — | — | — |

The ratio climbs 1.1252.74101.41.125 \to 2.74 \to 101.4 as the guard falls 4/42/41/40/44/4 \to 2/4 \to 1/4 \to 0/4. The sole eligible arm, λ=0.1\lambda = 0.1, fails Leg 1 (median 1.125<1.51.125 < 1.5, reproducing exp9 exactly). The large ratios at λ0.3\lambda \ge 0.3 are non-verdict quantities — those arms fail the 3/4\ge 3/4 arm-level task guard, so their inflated QQ is pure anti-convergence the matched/held-crossing design excludes. At λ1.0\lambda \ge 1.0 several arms re-cross their KL thresholds upward (e.g. (4,2)λ=1.0(4,2)\cdot\lambda{=}1.0 ends at LKL/LKL,0=1.78L_{KL}/L_{KL,0} = 1.78; (5,2)λ=3.0(5,2)\cdot\lambda{=}3.0 at 2.112.11), and the held-at-stop rule (D5) correctly excludes those hollow crossings.

Even the eligible arm's genuine channel is idle: GγG_\gamma median 1.010>0.901.010 > 0.90, so holding gradient-mass overlap fixed, faster mixing did not shrink the denominator. The movement is signal-side (numerator ratios 0.96–2.05) plus μC\mu_C-shrink (where the denominator fell, e.g. (5,2)(5,2)\cdotc25 at 0.866, it came via Gμ=0.884G_\mu = 0.884, not Gγ=0.976G_\gamma = 0.976) — exactly the artifact the γeff\gamma_{eff} gate exists to exclude.

P3 — KL guard: clean monotone collapse, 4/4,2/4,1/4,0/44/4, 2/4, 1/4, 0/4. P4 — R4 carry-over: verified on 288 cells (verify-or-HALT, no HALT; max A2 residual 4.16×10174.16\times10^{-17}). P5 — descriptive: baseline reproduces exp8 bitwise (every matched rel-diff exactly 0.00.0); rP2r_{P2} median 0.99890.9989 over n=288n = 288.

Scope and caveats

Demo-level only. 4 cells, N10N \le 10, one seed table, one teacher set, five rungs — "does not steer / does not engage γeff\gamma_{eff}" does not generalize beyond this preregistered grid, and never to HTDML at scale. This is a predictor, not an estimator: exp11 moves and measures QstructQ_{struct}^{\perp} exactly; no sampling, no QopQ_{op}. The conditional tier stays [solid], the operational tier [conjectured] (KτintK \gg \tau_{int}, A7 open at scale). This is not "cannot steer" — Outcome 2 is a scoped negative; exp7's small-family crossable region keeps a genuine positive plausible elsewhere. No tag moves; G2 untouched.


What this feeds: the RBM-retrofit G1 column and the spine Risk-2 / HTDML-property annotations now record that no registered dose met both steering magnitude and task compatibility — closing the larger-λ\lambda frontier at demo level while leaving the operational factorization tier exactly where it was.

— fin. —