Benchmarking Recursive-Collapse Warning Claims Under Matched False-Positive Control

David Mullett

arxiv: 2606.00329 · v1 · pith:VSKHRRUSnew · submitted 2026-05-29 · 📡 eess.SY · cs.LG· cs.SY· stat.ML

Benchmarking Recursive-Collapse Warning Claims Under Matched False-Positive Control

David Mullett This is my paper

classification 📡 eess.SY cs.LGcs.SYstat.ML

keywords benchmarkrecursiveunderbenchmarksclaimscontractdetectordirectional

0 comments

read the original abstract

Recursive systems can enter collapse-like regimes -- self-reinforcing amplification, persistent recursion, and narrowing diversity that mask accelerating internal degradation -- before overt failure becomes visible. We introduce Loopzero, a claim-bounded benchmark framework for testing whether recursive failures follow a directional telemetry pattern: rising gain (G), recursive persistence (p), and declining diversity ($\delta$). The claim boundary is specified in Lean; the Lean artifact does not verify real telemetry, benchmark validity, or detector performance. We evaluate the bridge on two frozen public-artifact benchmarks: a segmented public-markets benchmark (Volmageddon 2018, COVID MWCB 2020) and a MovieLens-25M offline deterministic recommender replay. Detectors are evaluated under a locked equal-false-positive contract (FP $\in$ [0.03, 0.07], pre-registered) so all configurations face the same alert budget. Neither tested standard comparators nor Loopzero's pre-registered quantile detector achieved an accepted operating point. Directional witness alignment held on both canonical benchmarks, with adjacent-horizon and row-level limitations disclosed. Digitized Shumailov et al. (2024) LLM training-loop trajectories are directionally consistent with the pattern; matched-FP evaluation in that domain is deferred. The contribution is a reproducible, falsifiable benchmark framework for evaluating recursive-collapse warning claims under an explicit alert-budget contract -- non-acceptance reported as a first-class scientific outcome.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs
cs.LG 2026-06 unverdicted novelty 4.0

Empirical test shows top-1 argmax concentration has zero precision as collapse warning in DLM LoRA training due to pre-equilibrium saturation while max gradient norm provides usable but family-specific detection on sh...