pith. machine review for the scientific record. sign in

arxiv: 2604.27282 · v1 · submitted 2026-04-30 · 💻 cs.CY · cs.LG· stat.AP

Recognition: unknown

The Likelihood Ratio Wall: Structural Limits on Accurate Risk Assessment for Rare Violence

Authors on Pith no claims yet

Pith reviewed 2026-05-07 08:04 UTC · model grok-4.3

classification 💻 cs.CY cs.LGstat.AP
keywords pretrial risk assessmentlikelihood ratiopositive predictive valueviolent recidivismbase ratessurveillance ceilingnumber needed to detainover-policing
0
0 comments X

The pith

Low base rates for violent re-offense create a hard precision limit that current risk tools cannot cross.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a universal bound called the Likelihood Ratio Wall showing that at violent re-arrest rates of 2-5 percent, tools would need far stronger separation power than they currently have to reach even 50 percent positive predictive value when labeling someone high risk. This follows directly from the standard Bayes relationship linking likelihood ratios, low prevalence, and the proportion of true positives among positive labels. The bound means that tools can look reasonably accurate on aggregate metrics yet still be wrong most of the time they flag an individual for violence. The authors further show that simple score recalibration leaves the underlying separation limit untouched and that over-policing of non-offenders imposes an additional Surveillance Ceiling on precision for affected groups.

Core claim

We derive a universal precision bound—the Likelihood Ratio Wall—showing that when violent re-arrest rates are low (2-5%), achieving even a 50% hit rate among people labeled 'high risk' (positive predictive value, or PPV) would require tools far more discriminative than current instruments appear to be. For rare outcomes, a tool can have respectable-looking performance metrics and still be wrong most of the time it flags someone as 'high risk for violence.' We show that post-hoc score recalibration cannot solve this problem because it does not improve the tool's underlying ability to separate true positives from false positives. We further prove a Surveillance Ceiling: when over-policing infl

What carries the argument

The Likelihood Ratio Wall, a bound on achievable positive predictive value obtained from the Bayes relation between likelihood ratios and base rates for rare events.

If this is right

  • Current instruments cannot deliver high-confidence individual labels for rare violent re-offense without many false positives.
  • Recalibrating existing scores leaves the precision bound unchanged because it does not increase separation of true and false positives.
  • Over-policed groups face a structurally lower ceiling on precision even when their true offense rates match other groups.
  • Decisions should incorporate the Number Needed to Detain rather than relying on binary high-risk flags.
  • Fairness discussions that ignore base-rate limits and surveillance effects remain incomplete.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar structural limits likely appear in any domain that tries to predict rare binary events from noisy observational features, such as rare disease screening or low-probability security threats.
  • Policy could shift from individualized detention thresholds toward group-level or resource-allocation approaches that do not require crossing the wall.
  • Longitudinal data that track both recorded and unrecorded offending could test whether the Surveillance Ceiling is the dominant practical constraint.
  • The bound supplies a concrete benchmark against which future instrument improvements can be measured rather than aggregate accuracy alone.

Load-bearing premise

The observed discrimination power of current instruments is representative of what is achievable and the standard Bayes model relating likelihood ratios to PPV holds without unmodeled dependencies between features and policing intensity.

What would settle it

A large-scale empirical test that measures the actual likelihood ratio of an existing pretrial instrument and shows whether it exceeds the threshold required to reach 50 percent PPV at 2-5 percent base rate, or a direct count of realized PPV in a jurisdiction with known low violent re-arrest prevalence.

Figures

Figures reproduced from arXiv: 2604.27282 by Marco Pollanen.

Figure 1
Figure 1. Figure 1: The Likelihood Ratio Wall. Curves show the LR required for each PPV level. The gray region below the solid curve is view at source ↗
Figure 2
Figure 2. Figure 2: Stylized schematic (not empirical density estimates) of the Surveillance Ceiling mechanism. Over-policing shifts the view at source ↗
read the original abstract

Pretrial risk assessment tools are used on over one million U.S. defendants each year, yet their use for predicting rare violent re-offense faces a basic statistical barrier. We derive a universal precision bound -- the Likelihood Ratio Wall -- showing that when violent re-arrest rates are low (2-5%), achieving even a 50% hit rate among people labeled "high risk" (positive predictive value, or PPV) would require tools far more discriminative than current instruments appear to be. For rare outcomes, a tool can have respectable-looking performance metrics and still be wrong most of the time it flags someone as "high risk for violence." We show that post-hoc score recalibration cannot solve this problem because it does not improve the tool's underlying ability to separate true positives from false positives. We further prove a Surveillance Ceiling: when over-policing inflates recorded "risk factors" among those who would not re-offend, the maximum achievable precision is structurally lower for over-policed groups, even at equal offense rates. We translate these results into the Number Needed to Detain (how many people must be detained to prevent one violent offense), and propose that risk reports should communicate this uncertainty explicitly. Our findings suggest that for rare violent outcomes, debates about fairness metrics alone are incomplete: under current data regimes, the available features may not support high-confidence individualized detention decisions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper derives a 'Likelihood Ratio Wall' from the standard Bayes relation between positive predictive value (PPV), likelihood ratio (LR+), and base rate, showing that at violent re-arrest base rates of 2-5% even a 50% PPV requires LR+ values substantially higher than those exhibited by existing pretrial risk instruments. It further proves a 'Surveillance Ceiling' arising when over-policing inflates recorded risk factors among non-offenders, demonstrates that post-hoc recalibration cannot overcome the underlying discrimination limit, introduces the Number Needed to Detain metric, and recommends explicit communication of uncertainty in risk reports. The central claim is that these statistical and data-regime constraints impose structural limits on high-confidence individualized detention decisions for rare violent outcomes.

Significance. If the modeling of policing effects is completed and the empirical claims about current tool performance are substantiated, the work supplies a clean, algebraically transparent bound that formalizes long-standing concerns about base-rate problems in criminal justice risk assessment. The explicit derivation of the required LR+ and the Surveillance Ceiling provide a useful quantitative language for policy discussions that have often remained qualitative. The translation into Number Needed to Detain and the call for uncertainty disclosure are practical contributions that could improve how risk scores are presented to judges and defendants.

major comments (2)
  1. [Surveillance Ceiling section] Surveillance Ceiling section: the model correctly captures the channel in which over-policing inflates false-positive risk factors, but omits any symmetric term for differential detection of the true violent offense (i.e., the outcome variable). Consequently the derived numerical wall describes performance on the observed re-arrest indicator rather than on the policy-relevant quantity of actual violence prevented; the manuscript should either extend the model to include both channels or explicitly qualify the scope of the bound.
  2. [Likelihood Ratio Wall derivation] Likelihood Ratio Wall derivation and empirical comparison: the algebraic rearrangement of PPV = (LR+ * base_rate) / (1 - base_rate + LR+ * base_rate) to solve for the minimum LR+ needed for a target PPV is correct and parameter-free given the base rate. However, the load-bearing claim that 'current instruments appear to be' unable to supply the required LR+ rests on specific empirical estimates; the paper should state the exact LR+ values, data sources, and jurisdictions used for that comparison so readers can assess generalizability.
minor comments (2)
  1. The abstract and introduction should more sharply distinguish the universal mathematical bound (which holds under the stated Bayes assumptions) from the contingent empirical claim about the discrimination power of existing tools.
  2. Notation for LR+ and PPV should be introduced with a brief reference to the diagnostic-testing literature at first use to aid readers outside criminology.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which have helped us sharpen the scope and empirical transparency of the manuscript. We address each major comment in turn below.

read point-by-point responses
  1. Referee: [Surveillance Ceiling section] Surveillance Ceiling section: the model correctly captures the channel in which over-policing inflates false-positive risk factors, but omits any symmetric term for differential detection of the true violent offense (i.e., the outcome variable). Consequently the derived numerical wall describes performance on the observed re-arrest indicator rather than on the policy-relevant quantity of actual violence prevented; the manuscript should either extend the model to include both channels or explicitly qualify the scope of the bound.

    Authors: We agree that the Surveillance Ceiling derivation applies to the observed re-arrest indicator, as the model incorporates over-policing effects on recorded risk factors and the administratively observed outcome. This focus is deliberate because risk tools are trained, validated, and used on observable data, and detention decisions are made on that basis. We acknowledge that differential detection of actual (unobserved) violence would constitute an additional channel. Extending the formal model to include both channels would require non-identifiable assumptions about group-specific rates of unreported violence, which cannot be recovered from current administrative datasets. We will therefore revise the manuscript to (i) explicitly qualify the scope of the Surveillance Ceiling to performance on the observed re-arrest indicator and (ii) add a concise discussion paragraph noting the implications of differential outcome detection as a limitation for interpreting the bound in terms of actual violence prevented. revision: partial

  2. Referee: [Likelihood Ratio Wall derivation] Likelihood Ratio Wall derivation and empirical comparison: the algebraic rearrangement of PPV = (LR+ * base_rate) / (1 - base_rate + LR+ * base_rate) to solve for the minimum LR+ needed for a target PPV is correct and parameter-free given the base rate. However, the load-bearing claim that 'current instruments appear to be' unable to supply the required LR+ rests on specific empirical estimates; the paper should state the exact LR+ values, data sources, and jurisdictions used for that comparison so readers can assess generalizability.

    Authors: We thank the referee for confirming the algebraic correctness of the Likelihood Ratio Wall derivation. The empirical comparison in the manuscript relies on published performance metrics from evaluations of pretrial instruments. In the revision we will insert a new subsection that explicitly reports the LR+ point estimates (and, where available, confidence intervals), the instruments (Public Safety Assessment, COMPAS, and others), the source studies, the jurisdictions or datasets (Broward County, Kentucky, and additional U.S. sites), and the base rates employed in those evaluations. We will also include a short sensitivity table showing how plausible variation in these LR+ values shifts the distance to the wall. This addition will make the empirical grounding fully transparent and allow readers to judge generalizability directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation applies standard Bayes relations to stated base rates

full rationale

The Likelihood Ratio Wall is obtained by algebraic rearrangement of the standard PPV formula PPV = (LR+ * base_rate) / (1 - base_rate + LR+ * base_rate) to solve for the minimum LR+ needed for a target PPV at low base rates. This is a direct, parameter-free consequence of the definitions of likelihood ratio and positive predictive value; no parameters are fitted within the paper and then relabeled as predictions. The Surveillance Ceiling is presented as a modeled proof under explicit assumptions about differential recording of risk factors, without reducing to self-citation chains, author-specific uniqueness theorems, or ansatzes imported from prior work. No self-definitional loops, fitted-input predictions, or renaming of known results occur. The central claims remain independent of any internal data fitting and rest on externally verifiable Bayes relations and the stated modeling assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard probabilistic relations between base rates, likelihood ratios, and positive predictive value, plus domain assumptions about low violent re-arrest prevalence and the effects of differential policing intensity on recorded features.

free parameters (1)
  • violent re-arrest base rate = 2-5%
    Input value (2-5%) used to compute the discrimination required for target PPV levels in the Likelihood Ratio Wall derivation
axioms (2)
  • standard math Positive predictive value is a function of the likelihood ratio and the base rate via Bayes' theorem
    Invoked as the foundation for deriving the universal precision bound
  • domain assumption Over-policing inflates recorded risk factors among non-offenders
    Central premise for proving the Surveillance Ceiling effect on maximum achievable precision

pith-pipeline@v0.9.0 · 5545 in / 1618 out tokens · 78625 ms · 2026-05-07T08:04:30.565290+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 13 canonical work pages

  1. [1]

    Altman and J

    Douglas G. Altman and J. Martin Bland. 1994. Statistics Notes: Diagnostic tests 2: predictive values.BMJ309, 6947 (1994), 102. doi:10.1136/bmj.309.6947.102

  2. [2]

    2016.Machine Bias

    Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016.Machine Bias. ProPublica. https://www.propublica.org/article/ machine-bias-risk-assessments-in-criminal-sentencing

  3. [3]

    2019.Race After Technology: Abolitionist Tools for the New Jim Code

    Ruha Benjamin. 2019.Race After Technology: Abolitionist Tools for the New Jim Code. Polity, Oxford, UK

  4. [4]

    Alexandra Chouldechova. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments.Big Data 5, 2 (2017), 153–163. doi:10.1089/big.2016.0047

  5. [5]

    Danielle Keats Citron and Frank Pasquale. 2014. The Scored Society: Due Process for Automated Predictions.Washington Law Review 89, 1 (2014), 1–33

  6. [6]

    2018.The Public Safety Assessment: A Re-validation in Kentucky

    Matthew DeMichele, Peter Baumgartner, Michael Wenger, Kelle Barrick, Megan Comfort, and Sanjay Misra. 2018.The Public Safety Assessment: A Re-validation in Kentucky. Technical Report. RTI International. https://advancingpretrial.org/psa/about/research/

  7. [7]

    Desmarais, Kiersten L

    Sarah L. Desmarais, Kiersten L. Johnson, and Jay P. Singh. 2016. Performance of Recidivism Risk Assessment Instruments in U.S. Correctional Settings.Psychological Services13, 3 (2016), 206–222. doi:10.1037/ser0000075

  8. [8]

    Will Dobbie, Jacob Goldin, and Crystal S. Yang. 2018. The Effects of Pre-trial Detention on Conviction, Future Crime, and Employment: Evidence from Randomly Assigned Judges.American Economic Review108, 2 (2018), 201–240. doi:10.1257/aer.20161503

  9. [9]

    Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian

    Danielle Ensign, Sorelle A. Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2018. Runaway Feedback Loops in Predictive Policing. InProceedings of the 1st Conference on Fairness, Accountability and Transparency (FAT*), Vol. 81. PMLR, 160–171. Likelihood Ratio Wall•15

  10. [10]

    2018.Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor

    Virginia Eubanks. 2018.Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin’s Press, New York, NY

  11. [11]

    Ben Green. 2020. The False Promise of Risk Assessments: Epistemic Reform and the Limits of Fairness. InProceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAccT ’20). ACM, 594–606. doi:10.1145/3351095.3372869

  12. [12]

    David J. Hand. 2009. Measuring Classifier Performance: A Coherent Alternative to the Area Under the ROC Curve.Machine Learning 77, 1 (2009), 103–123. doi:10.1007/s10994-009-5119-5

  13. [13]

    Harcourt

    Bernard E. Harcourt. 2007.Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age. University of Chicago Press, Chicago, IL

  14. [14]

    John P. A. Ioannidis. 2005. Why Most Published Research Findings Are False.PLoS Medicine2, 8 (2005), e124. doi:10.1371/journal.pmed. 0020124

  15. [15]

    Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2017. Inherent Trade-Offs in the Fair Determination of Risk Scores. In Proceedings of the 8th Innovations in Theoretical Computer Science Conference (ITCS). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 43:1–43:23

  16. [16]

    Sackett, and Robin S

    Andreas Laupacis, David L. Sackett, and Robin S. Roberts. 1988. An Assessment of Clinically Useful Measures of the Consequences of Treatment.New England Journal of Medicine318, 26 (1988), 1728–1733. doi:10.1056/NEJM198806303182605

  17. [17]

    Lowenkamp, Marie VanNostrand, and Alexander Holsinger

    Christopher T. Lowenkamp, Marie VanNostrand, and Alexander Holsinger. 2013.The Hidden Costs of Pretrial Detention. Technical Report. Laura and John Arnold Foundation

  18. [18]

    Kristian Lum and William Isaac. 2016. To Predict and Serve?Significance13, 5 (2016), 14–19. doi:10.1111/j.1740-9713.2016.00960.x

  19. [19]

    Sandra G. Mayson. 2018. Dangerous Defendants.Yale Law Journal127, 3 (2018), 490–568

  20. [20]

    C. M. A. McCauliff. 1982. Burdens of Proof: Degrees of Belief, Quanta of Evidence, or Constitutional Guarantees?Vanderbilt Law Review 35, 6 (1982), 1293–1335

  21. [21]

    Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting Good Probabilities with Supervised Learning. InProceedings of the 22nd International Conference on Machine Learning (ICML ’05). ACM, 625–632. doi:10.1145/1102351.1102430

  22. [22]

    NYC Mayor’s Office of Criminal Justice. 2021. Pretrial Docketed Rearrest: Contextual Overview — December 2021 Update. https://criminaljustice.cityofnewyork.us/wp-content/uploads/2021/12/Pretrial-Docketed-Rearrest-Contextual-Overview- December-2021-Update.pdf. Accessed: 2026-04-29

  23. [23]

    John C. Platt. 1999. Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. InAdvances in Large Margin Classifiers, Alexander J. Smola, Peter Bartlett, Bernhard Schölkopf, and Dale Schuurmans (Eds.). MIT Press, 61–74

  24. [24]

    Cynthia Rudin. 2019. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.Nature Machine Intelligence1, 5 (2019), 206–215. doi:10.1038/s42256-019-0048-y

  25. [25]

    Wendy Sawyer and Peter Wagner. 2023. Mass Incarceration: The Whole Pie 2023. https://www.prisonpolicy.org/reports/pie2023.html

  26. [26]

    Skeem and John Monahan

    Jennifer L. Skeem and John Monahan. 2011. Current Directions in Violence Risk Assessment.Current Directions in Psychological Science 20, 1 (2011), 38–42. doi:10.1177/0963721410397240

  27. [27]

    Sonja B. Starr. 2014. Evidence-Based Sentencing and the Scientific Rationalization of Discrimination.Stanford Law Review66, 4 (2014), 803–872

  28. [28]

    Stevenson

    Megan T. Stevenson. 2018. Assessing Risk Assessment in Action.Minnesota Law Review103, 1 (2018), 303–384

  29. [29]

    Stevenson and Sandra G

    Megan T. Stevenson and Sandra G. Mayson. 2022. Pretrial Detention and the Value of Liberty.Virginia Law Review108, 3 (2022), 709–782

  30. [30]

    Supreme Court of the United States. 1979. Addington v. Texas. 441 U.S. 418. U.S. Supreme Court

  31. [31]

    United States Congress. 1984. Bail Reform Act of 1984. 18 U.S.C. § 3142. Pub. L. No. 98–473, 98 Stat. 1976 (codified at 18 U.S.C. § 3142)

  32. [32]

    Bianca Zadrozny and Charles Elkan. 2002. Transforming Classifier Scores into Accurate Multiclass Probability Estimates. InProceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’02). ACM, 694–699. doi:10.1145/ 775047.775151 A Full Proof of Theorem 4 (Surveillance Ceiling) We provide the complete proof exp...