This is the spine of the program: the precise statement of the thermodynamic-trainability quantity , the corrected observable-projected target it factorizes into, and the discipline that keeps a proven conditional from being mistaken for a validated law.
The question
For one reverse-process EBM layer of a Denoising Thermodynamic Model (DTM) — energy , model law on , sampled by a reversible Gibbs kernel — does it train? The training gradient component (DTM paper Eq. (14)) is a data-minus-model difference of the per-parameter observable . All the mixing pain is in the model (negative) phase, estimated by Gibbs sampling. The question is whether one scalar predicts when that estimate is too noisy to descend on.
The setup
What you measure is the operational quantity — the gradient SNR, squared:
This carries the [solid] tag. means SGD descends; is the training plateau of DTM paper Fig. 5(b). Crucially this is an estimation plateau — the recoverability of collapses, the estimator MSE swamps the signal — not signal-extinction. That is the deliberate contrast with the quantum barren plateau, where the true gradient variance itself vanishes. Same phenomenology, different mechanism.
The estimator is canonical burn-in + window-: discard steps, average over the next . Standard reversible-MCMC machinery (Levin–Peres 2017, Lemma 12.2 / Thm 12.21) then gives a geometric bias and a variance , with slow-mode weight , gap , and .
The result — and why the obvious form is wrong
The historical factorization, [conjectured], was the single-gap product
read as , computable without training to convergence and differentiable in the couplings (the HTDML property). It is superseded as written. Exact diagonalization (experiments/exp1-exact-diag/) and the RBM smoke test (experiments/exp2-thrml-smoke/) showed the anchor is the wrong object for two model-driven reasons (exp2 reproduced both under block-Gibbs, so they are not single-site artifacts):
- Mis-anchored. Under the spin-flip symmetry of the EBMs, the slowest mode is odd and exactly orthogonal to the even gradient observables — so (overlap in exp2) and the single- is a divide-by-symmetry-zero, over-predicting by –.
- Clustered. The observable-relevant slow structure clusters; a single relevant gap tracks in 23/48 exp1 cells, the multi-mode correction in 45/48.
The corrected target is (observable-projected, multi-mode). It restricts attention to the modes the gradient sees, , builds the aggregate timescale (half-Sokal ), and the harmonic-mean gap over . The predictor is
Scope and caveats — the two-tier tag
This is the precise part. The factorization carries a split tag (researcher-conferred 2026-06-01):
- Conditional factorization —
[solid]. In regime A1–A8 + plateau () + F4, . This is a written proof across six obligations O1–O6, each adversarially verified. O1.c is flipped toproven-here— the wiki's first and only terminal-tagged block (the projection-vs-conditioning SNR invariance). O2–O6 are[solid]assemblies closing to Levin–Peres 2017 + Younes 1999 (whose §7 asymptotic-variance object grounds what is, no more). The numerical chain needs no A9. - Operational / unconditional claim —
[conjectured]. " on a real DTM at the one runs" is gated on A7 (overlapping-bulk relaxation) and — both assumed. The conditional is vacuous on a real DTM's plateau until those gates are met.
Neither tier is validated. The supporting evidence is construction-confirmation on small/moderate controlled models: exp1 (45/48 cells) + exp2 (92–99% across , both kernels). At scale it does not hold up: experiments/exp3-htdml-embedding/ on the real 60_12 MNIST DTM is untested at adequate equilibration (–, the feasible window; the linear-in- predictor over-predicts, reaches 5–6 at ). The re-freeze on a reversible kernel (experiments/exp4-reversible-50tau/) found UNRESOLVED — (166.7 to 5,280 over , constant) — so is unreachable, not merely unmet (P0-HALT). exp6 reproduced at every checkpoint .
The honest reading is the A2 ↔ A6 structural-obstruction observation: reversibility (A2, load-bearing for O4/O5) and (A6) are not contradictory, but they are operationally antagonistic in exactly the multimodal/plateau regime the theorem is about — satisfying A2 pushes the sampler toward the slow plateau where A6 is hardest. Whether this is fundamental or scale-dependent stays open: at small scale it is escapable (exp7 found a crossable sweet spot, 25/64 cells; exp12 cut 14–22 with reversible parallel tempering), but the trained-DTM PT run (exp16) was withdrawn for an init-weight kernel bug, so the at-scale reversal is not established.
The six-entry risk ledger records each threat [open] with a mitigation: (1) slow-mode cluster (the big one — fires, drove the correction); (2) differentiability of through the embedding (O5.c partial — is smooth across within-cluster crossings via the cluster Riesz projector, but the -membership boundary stays open); (3) circularity at the plateau (circumvented, live where the reference is unconverged — exp3's moved 23%); (4) positive phase not free (confirmed subdominant, median 0.046 on exp3; the at-scale reversal withdrawn); (5) the regime gate (binding; may make it unsatisfiable at scale); (6) canonical-estimator choice (burn-in + window fixed; PCD sits outside as a moving-target tracking-lag family).
What this feeds
Everything else in the program feeds or tests this page: the MET corollary reads its three Ragone-shaped factors (, , budget starvation) through the corrected projected quantities; the HTDML objective (exp8/exp9/exp11) exercises the differentiable-in- property; the at-scale validation path (exp14 → a GPU DTM PT P0) is the still-pending route to → validated.
What this feeds: the spine that every literature bridge, parent translation, and experiment ladder either supports or attacks — and the standing open step is an at-scale run with A7 + met on a reversible kernel.
Sources
- Jelinčič et al. 2025, Denoising Thermodynamic Models, arXiv:2510.23972 (Eq. (14), Fig. 5b, App. E/G/H).
- Levin & Peres 2017, Markov Chains and Mixing Times (Lemma 12.2, Thm 12.21).
- Younes 1999, Markovian stochastic-approximation convergence (§7, Thm 3 — the variance object).
- Hinton 2002, contrastive divergence (the data-minus-model gradient lineage).