Exp 18 — PT Replica Feasibility: PT-MARGINAL · Thermodynamic Machine Learning

Shrinking the per-edge temperature step alone moves swap acceptance ×33, refuting the saturating-ceiling prediction — but a uniform schedule over a non-uniform specific heat over-resolves the cold end and starves the hot, so no in-band ladder exists at $R \le 24$ .

This is the complete technical record for experiments/exp18-pt-replica-feasibility/. Here we keep the numbers, the gates, and the claim-status discipline.

The question

After exp17 was held and the post-exp15 baseline corrected, the question is feasibility-first: on the corrected trained 60_12 DTM at $t=200$ , does shrinking the per-edge step alone — a uniform-Δα reversible-PT ladder over the fixed span $[0.5, 1.0]$ — carry swap acceptance into the frozen band $[0.15, 0.60]$ on every edge, and deliver $\ge N_{RT}=5$ round-trips, at a tractable $R$ ? The probe stops at the ladder, upstream of any $T_O$ , speedup, $P4$ / $F4$ , or $Q_{op}$ -vs- $Q_{struct}^{\perp}$ read. It is MEASURE-ONLY: no cell moves a tag.

The setup

Stage A is a flat pilot uniform-Δα acceptance sweep: $R \in \{4,6,8,12,16,24\}$ , L_LADDER=2000, N_LADDER_CHAINS=32, pooled over LADDER_SEEDS=(300,301,302), span fixed $[0.5,1.0]$ , band $[0.15,0.60]$ . Round-trips are a diagnostic at flat pilot $L$ , not a gate (all rows showed round_trips_min_diag = 0). Six provenance/cert gates passed first — most load-bearing is the trained-weight refresh: stale_vs_trained = 60.64, refreshed_vs_trained = 0.0, so the per-replica LOCAL kernel targets the same trained parameterization as the swap energy. exp15/16 silently used INIT weights for the local kernels; this run's acceptances are therefore a valid read of the trained DTM, unlike them. Ran on an H200, jax=0.10.1, wall 2.143 GPU-h, DECISION: PROCEED.

The result

Two competing edges, never simultaneously in band:

| $R$ | per-edge Δα | cold edge | hot edge (worst) | band_ok | hot_ok | |---:|---:|---:|---:|:--:|:--:| | 4 | 0.1667 | 0.0182 | 0.0024 | ✗ | ✓ | | 6 | 0.1000 | 0.0843 | 0.0056 | ✗ | ✓ | | 8 | 0.0714 | 0.1836 | 0.0168 | ✗ | ✓ | | 12 | 0.0455 | 0.3159 | 0.0836 | ✗ | ✓ | | 16 | 0.0333 | 0.4359 | 0.1447 | ✗ | ✓ | | 24 | 0.0217 | 0.6008 | 0.2709 | ✗ | ✓ |

Cold-edge acceptance rises monotonically ×33 (0.0182 → 0.6008), crossing the floor (0.15) at $R \approx 7$ – $8$ and overshooting the ceiling by $R=24$ — 11 of 23 edges exceed 0.60 (up to 0.664). The hot edge rises too (0.0024 → 0.2709) but lags, crossing the floor only at $R \approx 18$ (extrapolated; $R\in\{18,20,22\}$ were off-grid). So no uniform ladder at $R \le 24$ has every edge in band at once: at small $R$ the hot edges starve below 0.15; by the $R$ where the hot edge clears, the cold edges have already overshot 0.60. R_star=null, no Stage B entered.

Why uniform is the wrong schedule. The single-replica scan gives a non-uniform specific heat $C(\alpha)$ ranging 13,143 → 69,078 (min near $\alpha\approx 0.56$ , max near $\alpha\approx 0.88$ ), $\int\!\sqrt{C}\,d\alpha = 101.5$ , ⇒ Gaussian-overlap $R^*_{predicted}=61$ for the optimal per-edge $A=0.234$ . Per-edge acceptance tracks $\Delta\alpha\cdot\sqrt{C}$ locally, so a constant $\Delta\alpha$ yields non-constant acceptance — exactly the cold-overshoot / hot-starve split. The hottest replica at $\alpha_{top}=0.5$ also does not itself decorrelate: $\tau_{hot}=317.8$ (hot_ok held only because L_LADDER=2000 $\ge 5\cdot 317.8 \approx 1589$ , ~26% Sokal headroom — a marginally-resolved estimate).

This is PT-MARGINAL: replica-/placement-bound but not hopeless. The primary §2 prediction — cold-edge intercept saturating $<0.15$ as $\Delta\alpha\to 0$ , i.e. PT-CEILING-INFEASIBLE — is refuted in direction; acceptance moves, so the adjacent-overlap-wall reading is excluded. This was the documented §2 "informative surprise," not a miss: the confounded geometric $\alpha_R$ -sweep fit does not transfer to the uniform-Δα family. The hot-end prediction ( $\tau_{hot}$ stays large, fork pointed at $\beta_{top}\to 0$ ) was confirmed in direction. No threshold was relaxed (SWAP_BAND/N_RT/SOKAL_C frozen verbatim).

Scope and caveats

It does not show that the DTM cannot be mixed. It is config-scoped to one trained DTM (60_12, SEED=0, $t=200$ , INPUT_IDX=0), one span, one uniform ladder family. It is not a pass and feeds no claim — a ladder still does not mix this DTM. The grid skips $R\in\{18,20,22\}$ , so a narrow uniform window $\approx[18,22]$ might graze the band on a knife-edge, but $\tau_{hot}\approx 318$ makes the Stage-B round-trip gate doubtful and the schedule fragile. $R^*_{predicted}=61$ is a Gaussian-unimodal lower bound for a multimodal landscape — it can corroborate infeasibility, never rule mixing in. Burn-in init (no exact $\pi_r$ ): a STOP reads "PT does not mix under this schedule at tractable $R$ ," not "the DTM cannot mix." Conditional factorization stays [solid], operational claim stays [conjectured] — both unchanged, no tag flip.

What it feeds

This corroborates the spine's Risk-5 (A2↔A6 antagonism) and sharpens the post-exp17 Risk-4 picture, but writes no annotation. It refines the corrected baseline (../exp15-recheck-trained-weights/, LADDER-INADEQUATE-TRAINED) from "R4 fails" to "uniform-Δα fails by placement, hot end $\tau\approx 318$ ." The diagnostics point two complementary pivots: equal-acceptance placement (rungs so $\Delta\alpha\cdot\sqrt{C}\approx$ const) to fix acceptance shape, and a hotter top ( $\beta_{top}\to 0$ ) to fix hot-end ergodicity. The pivot is the researcher's; this run only narrows the menu.

What this feeds: exp19 — equal-acceptance placement plus a hotter top, reusing the same A2-reversible machinery and a fresh gate-1.