Shrinking the per-edge temperature step alone moves swap acceptance ×33, refuting the saturating-ceiling prediction — but a uniform schedule over a non-uniform specific heat over-resolves the cold end and starves the hot, so no in-band ladder exists at .
This is the complete technical record for experiments/exp18-pt-replica-feasibility/. Here we keep the numbers, the gates, and the claim-status discipline.
The question
After exp17 was held and the post-exp15 baseline corrected, the question is feasibility-first: on the corrected trained 60_12 DTM at , does shrinking the per-edge step alone — a uniform-Δα reversible-PT ladder over the fixed span — carry swap acceptance into the frozen band on every edge, and deliver round-trips, at a tractable ? The probe stops at the ladder, upstream of any , speedup, /, or -vs- read. It is MEASURE-ONLY: no cell moves a tag.
The setup
Stage A is a flat pilot uniform-Δα acceptance sweep: , L_LADDER=2000, N_LADDER_CHAINS=32, pooled over LADDER_SEEDS=(300,301,302), span fixed , band . Round-trips are a diagnostic at flat pilot , not a gate (all rows showed round_trips_min_diag = 0). Six provenance/cert gates passed first — most load-bearing is the trained-weight refresh: stale_vs_trained = 60.64, refreshed_vs_trained = 0.0, so the per-replica LOCAL kernel targets the same trained parameterization as the swap energy. exp15/16 silently used INIT weights for the local kernels; this run's acceptances are therefore a valid read of the trained DTM, unlike them. Ran on an H200, jax=0.10.1, wall 2.143 GPU-h, DECISION: PROCEED.
The result
Two competing edges, never simultaneously in band:
| | per-edge Δα | cold edge | hot edge (worst) | band_ok | hot_ok |
|---:|---:|---:|---:|:--:|:--:|
| 4 | 0.1667 | 0.0182 | 0.0024 | ✗ | ✓ |
| 6 | 0.1000 | 0.0843 | 0.0056 | ✗ | ✓ |
| 8 | 0.0714 | 0.1836 | 0.0168 | ✗ | ✓ |
| 12 | 0.0455 | 0.3159 | 0.0836 | ✗ | ✓ |
| 16 | 0.0333 | 0.4359 | 0.1447 | ✗ | ✓ |
| 24 | 0.0217 | 0.6008 | 0.2709 | ✗ | ✓ |
Cold-edge acceptance rises monotonically ×33 (0.0182 → 0.6008), crossing the floor (0.15) at – and overshooting the ceiling by — 11 of 23 edges exceed 0.60 (up to 0.664). The hot edge rises too (0.0024 → 0.2709) but lags, crossing the floor only at (extrapolated; were off-grid). So no uniform ladder at has every edge in band at once: at small the hot edges starve below 0.15; by the where the hot edge clears, the cold edges have already overshot 0.60. R_star=null, no Stage B entered.
Why uniform is the wrong schedule. The single-replica scan gives a non-uniform specific heat ranging 13,143 → 69,078 (min near , max near ), , ⇒ Gaussian-overlap for the optimal per-edge . Per-edge acceptance tracks locally, so a constant yields non-constant acceptance — exactly the cold-overshoot / hot-starve split. The hottest replica at also does not itself decorrelate: (hot_ok held only because L_LADDER=2000 , ~26% Sokal headroom — a marginally-resolved estimate).
This is PT-MARGINAL: replica-/placement-bound but not hopeless. The primary §2 prediction — cold-edge intercept saturating as , i.e. PT-CEILING-INFEASIBLE — is refuted in direction; acceptance moves, so the adjacent-overlap-wall reading is excluded. This was the documented §2 "informative surprise," not a miss: the confounded geometric -sweep fit does not transfer to the uniform-Δα family. The hot-end prediction ( stays large, fork pointed at ) was confirmed in direction. No threshold was relaxed (SWAP_BAND/N_RT/SOKAL_C frozen verbatim).
Scope and caveats
It does not show that the DTM cannot be mixed. It is config-scoped to one trained DTM (60_12, SEED=0, , INPUT_IDX=0), one span, one uniform ladder family. It is not a pass and feeds no claim — a ladder still does not mix this DTM. The grid skips , so a narrow uniform window might graze the band on a knife-edge, but makes the Stage-B round-trip gate doubtful and the schedule fragile. is a Gaussian-unimodal lower bound for a multimodal landscape — it can corroborate infeasibility, never rule mixing in. Burn-in init (no exact ): a STOP reads "PT does not mix under this schedule at tractable ," not "the DTM cannot mix." Conditional factorization stays [solid], operational claim stays [conjectured] — both unchanged, no tag flip.
What it feeds
This corroborates the spine's Risk-5 (A2↔A6 antagonism) and sharpens the post-exp17 Risk-4 picture, but writes no annotation. It refines the corrected baseline (../exp15-recheck-trained-weights/, LADDER-INADEQUATE-TRAINED) from "R4 fails" to "uniform-Δα fails by placement, hot end ." The diagnostics point two complementary pivots: equal-acceptance placement (rungs so const) to fix acceptance shape, and a hotter top () to fix hot-end ergodicity. The pivot is the researcher's; this run only narrows the menu.
What this feeds: exp19 — equal-acceptance placement plus a hotter top, reusing the same A2-reversible machinery and a fresh gate-1.