Exp 11 — Larger-λ Sweep: The Dose-Response Is the Result

Turning the regularizer knob harder does not buy steering — it buys a steeper trade between an inflating diagnostic and a collapsing task.

This is the complete technical record for experiments/exp11-htdml-objective-lambda-sweep/. Here we keep the numbers, the gates, and the claim-status discipline.

The question

exp9 showed that at $\lambda = 0.1$ the trainability objective $L = L_{KL} + \lambda\,(-\log Q_{struct}^{\perp})$ under-steers. The obvious rejoinder: push $\lambda$ higher. So exp11 asks two things at G1-level objective-usability (never G2): does a larger $\lambda$ steer $Q_{struct}^{\perp}$ past the bar, and if it does, does it steer through the genuine $\gamma_{eff}$ channel — faster mixing shrinking the gradient-noise denominator — rather than through anti-convergence or a $\mu_C$ -shrink artifact?

The setup

A frozen five-rung ladder $\lambda \in \{0, 0.1, 0.3, 1.0, 3.0\}$ over 4 training cells ( $N \le 10$ , 4 teachers at $\beta_t = 3$ ), 20 runs total, Adam step cap 2000. Pre-committed at gate-1 (d8073cc, before any implementation); runner frozen at gate-2 (dfd1bc6) after rehearsal and a 4-lens adversarial audit (0 MAJOR). The verdict basis is per-arm: an arm is verdict-eligible only if it passes the $\ge 3/4$ held-c25 task guard and has $N_{pairs} \ge 4$ . Steering needs Leg 1 (median matched-crossing pooled- $Q$ ratio $\ge \rho = 1.5$ ) and Leg 2 (genuine-channel gate $G_\gamma \le 0.90$ at $\delta = 0.10$ ). Ran 2026-06-14, laptop CPU, 2711 s wall; JAX 0.9.1 (x64). wandb ran offline (instrumentation-only, non-verdict-bearing).

The result

P1 — feasibility + fidelity: PASS (construction/formula-level, never "empirically validated"). All 20 runs finite (zero non-finite events); the value-agreement gate held across 26 comparisons at max $|Q_{JAX} - Q_{numpy}|/|Q_{numpy}| = 2.26\times10^{-14}$ (gate $10^{-10}$ ); the FD battery passed at all 4 cells. The factored resolvent ran in-loop across ~32 000 pooled- $Q$ evaluations with gradients — the objective is computable, differentiable, and in-loop-usable at every $\lambda$ .

P2 — steering: Outcome 2, does NOT steer in this $\lambda$ range (scoped negative). The matched-crossing pooled- $Q$ ratio rises steeply while the guard-pass fraction collapses:

| arm | $\lambda$ | guard | eligible | $N_{pairs}$ | median ratio | Leg 1 | Leg 2 ( $G_\gamma$ ) | |---|---|---|---|---|---|---|---| | baseline | 0 | 4/4 | — | — | — | — | — | | lam0p1 | 0.1 | 4/4 | ✓ | 8 | 1.125 | FAIL ( $<1.5$ ) | FAIL (1.010 $> 0.90$ ) | | lam0p3 | 0.3 | 2/4 | ✗ | 4 | 2.744 | — | — | | lam1p0 | 1.0 | 1/4 | ✗ | 2 | 101.37 | — | — | | lam3p0 | 3.0 | 0/4 | ✗ | 0 | — | — | — |

The ratio climbs $1.125 \to 2.74 \to 101.4$ as the guard falls $4/4 \to 2/4 \to 1/4 \to 0/4$ . The sole eligible arm, $\lambda = 0.1$ , fails Leg 1 (median $1.125 < 1.5$ , reproducing exp9 exactly). The large ratios at $\lambda \ge 0.3$ are non-verdict quantities — those arms fail the $\ge 3/4$ arm-level task guard, so their inflated $Q$ is pure anti-convergence the matched/held-crossing design excludes. At $\lambda \ge 1.0$ several arms re-cross their KL thresholds upward (e.g. $(4,2)\cdot\lambda{=}1.0$ ends at $L_{KL}/L_{KL,0} = 1.78$ ; $(5,2)\cdot\lambda{=}3.0$ at $2.11$ ), and the held-at-stop rule (D5) correctly excludes those hollow crossings.

Even the eligible arm's genuine channel is idle: $G_\gamma$ median $1.010 > 0.90$ , so holding gradient-mass overlap fixed, faster mixing did not shrink the denominator. The movement is signal-side (numerator ratios 0.96–2.05) plus $\mu_C$ -shrink (where the denominator fell, e.g. $(5,2)\cdot$ c25 at 0.866, it came via $G_\mu = 0.884$ , not $G_\gamma = 0.976$ ) — exactly the artifact the $\gamma_{eff}$ gate exists to exclude.

P3 — KL guard: clean monotone collapse, $4/4, 2/4, 1/4, 0/4$ . P4 — R4 carry-over: verified on 288 cells (verify-or-HALT, no HALT; max A2 residual $4.16\times10^{-17}$ ). P5 — descriptive: baseline reproduces exp8 bitwise (every matched rel-diff exactly $0.0$ ); $r_{P2}$ median $0.9989$ over $n = 288$ .

Scope and caveats

Demo-level only. 4 cells, $N \le 10$ , one seed table, one teacher set, five rungs — "does not steer / does not engage $\gamma_{eff}$ " does not generalize beyond this preregistered grid, and never to HTDML at scale. This is a predictor, not an estimator: exp11 moves and measures $Q_{struct}^{\perp}$ exactly; no sampling, no $Q_{op}$ . The conditional tier stays [solid], the operational tier [conjectured] ( $K \gg \tau_{int}$ , A7 open at scale). This is not "cannot steer" — Outcome 2 is a scoped negative; exp7's small-family crossable region keeps a genuine positive plausible elsewhere. No tag moves; G2 untouched.

What this feeds: the RBM-retrofit G1 column and the spine Risk-2 / HTDML-property annotations now record that no registered dose met both steering magnitude and task compatibility — closing the larger- $\lambda$ frontier at demo level while leaving the operational factorization tier exactly where it was.