What Didn’t Work — and Why the Negative Results Matter

A synthesis of the program's honest negatives: the operational trainability claim is gated not by a budget shortfall but by a structural obstruction, and each failure is sharp enough to be a result.

The Q-program proves a conditional factorization $Q_{op} \approx Q_{struct}^{\perp}$ in regime A1–A8 + plateau + F4 ([solid], written proof O1–O6 over the single proven-here lemma O1.c). The operational claim — that this holds on a real DTM at the $K$ one actually runs — stays [conjectured], gated by A2 ( $\pi$ -reversibility, which the no-leakage machinery needs) and A6 ( $K \gg \tau_{int}$ ). What follows is the record of trying to meet those gates.

The A2 ↔ A6 obstruction

The question. Can a kernel be both reversible (A2) and fast-mixing enough that $K \gg \tau_{int}$ (A6)? Nothing formally forbids it — but the factorization is about the multimodal plateau, exactly where the two become operationally antagonistic.

The setup. experiments/exp4-reversible-50tau/ fixed exp3's blockers: a genuinely reversible symmetrized 4-block Gibbs kernel $M = \tfrac12(P_{fwd}+P_{rev})$ (self-adjointness gated to $\sim 10^{-17}$ ), then a non-circular doubling-stability probe for $\hat\tau$ on the real 60_12 MNIST DTM at $t=200$ (Lightning H200).

The result. $\tau_{max}$ does not stabilize — it grows dead-linearly in trajectory length $L$ : $166.7 \to 5{,}280$ over $L = 1{,}000 \to 32{,}000$ , $\tau/L \approx 0.16$ across six doublings, and the doubling-stability rule ( $|\tau(2L)-\tau(L)|/\tau(L) < 0.15$ ) is never met. So $\hat\tau$ is UNRESOLVED, the $\gtrsim 50\tau$ windows are uninstantiable, and A6 is unreachable — registered outcome (F): P0-HALT, P1–P5 unrun. exp6 reproduced $\tau \propto L$ at every checkpoint $t \in \{25,50,100,200\}$ , pushing the reading from "deep-checkpoint effect" toward "slow from the start."

Scope and caveats. The exp3→exp4 jump is confounded (kernel and window both changed), so $\tau \ge 5{,}280$ is not pure truncation — but the clean attribution is to the reversible kernel mixing far slower. The deeper tension: exp3's faster kernel violated A2; the A2-valid kernel is the slow one. $\tau \propto L$ reads either as a genuine near-zero gap or as inadequate burn-in, but both give the same operational conclusion. No tag flip: A6 is unreachable here, not the conjecture false.

The thermodynamic-length wall

The question. If a single reversible kernel mixes too slowly, can a reversible parallel-tempering ladder bridge a hot, free-mixing regime down to the cold target — cheaply?

The setup. experiments/exp19-hotter-top-first/ ran an equal-acceptance reversible-PT mixing probe on the trained 60_12 DTM (single-input conditional, INPUT_IDX=0), with a hot-top sweep ( $\alpha_{top}$ floored at 0.01) and an analytic rung-count $R^*(\alpha_{top}) = 1 + \text{round}(\beta \int_{\alpha_{top}}^{1}\!\sqrt{C}\,d\alpha / \delta^*)$ , $\delta^* = 1.683$ . The trained-weight-refresh guard ( $refresh\_ok=True$ ) passed, fixing the bug that invalidated the earlier exp15/16 reads.

The result. A hotter top does cure decorrelation — the single replica reaches $\tau \le 25$ at $\alpha_{top}=0.02$ ( $\tau = 15.8$ ; at $\alpha=0.01$ , $\tau=2.2$ ): a temperature effect, not local-kernel non-ergodicity. But the cheapest decorrelating span $[0.02, 1.0]$ needs $R^* = 136$ rungs $\gg R_{max} = 96$ . The two frontiers do not overlap — decorrelation needs $\alpha_{top} \le 0.02$ , tractability needs $\alpha_{top} \ge {\sim}0.25$ ( $R^*(0.25)=94$ ). The gap is a thermodynamic-length wall of $\beta\!\int\!\sqrt{C} \approx 227$ — decisive STOP at Stage A.

Scope and caveats. Config-scoped (60_12, seed 0, $t=200$ , this kernel and ladder family) — never a fundamentality verdict about reversible PT. $R^*$ is a Gaussian-overlap lower bound, so the true cost can only be higher. exp18 had already shown PT fails by schedule, not by overlap: a uniform- $\Delta\alpha$ ladder over $[0.5,1.0]$ missed the acceptance band not because spacing can't move acceptance (it moves it $\times 33$ ) but because the DTM's specific heat is non-uniform — the hot edge clears the floor only after the cold edges overshoot the ceiling (PT-MARGINAL). exp19 then resolved both of exp18's suspects and the ladder is still intractable. MEASURE-ONLY; moves no tag.

You cannot optimize the proxy directly

The question. $Q_{struct}^{\perp}$ is computable without training to convergence and differentiable in $J$ . Can you therefore use it as an in-loop objective (HTDML) to steer a model toward trainability?

The setup. experiments/exp11-htdml-objective-lambda-sweep/ swept the objective weight $\lambda \in \{0, 0.1, 0.3, 1.0, 3.0\}$ over 4 cells, with a matched-/held-crossing design and a KL-compatibility task guard.

The result. The computability + differentiability-in-use half is PASS — up to $32{,}000$ in-loop pooled- $Q$ evaluations with gradients, agreeing with the unmodified numpy spectral pipeline to $2.26\times10^{-14}$ (P1). But the steering half does not steer (registered Outcome 2). As $\lambda$ rises the matched-crossing ratio climbs ( $1.125 \to 2.74 \to 101.4 \to$ undefined) while the task-guard pass fraction collapses ( $4/4 \to 2/4 \to 1/4 \to 0/4$ ). The only verdict-eligible arm, $\lambda=0.1$ , fails Leg 1 (median $1.125 < 1.5$ , reproducing exp9 exactly) and its genuine-channel gate ( $G\gamma$ median $1.010 > 0.90$ — the $\gamma_{eff}$ channel never engages). The large ratios at $\lambda \ge 0.3$ are non-verdict quantities: they buy $Q$ by anti-convergence that abandons the KL task. This is Goodhart — push the proxy past the bar and you destroy what it proxies for.

Scope and caveats. Demo-level scope ( $N \le 10$ , one seed table, this $\lambda$ ladder) — a scoped negative, never "cannot steer." It is a predictor measurement, not a $Q_{op}$ validation: no sampling ran. No tag moves; G2 untouched.

What this feeds: the two pre-registered pivots from exp19 — switch sampler families (hierarchical PT / simulated tempering / population annealing, at the risk of breaking A2 and trading a mixing problem for a theorem problem) or the theory route of a $\tau_{int}$ -robust $Q_{op}$ estimator that validates the operational claim without a fast-mixing kernel. In this field, a sharp, well-instrumented negative — A6 unreachable, the $R^*=136>96$ wall, Goodhart on the proxy — is the contribution.