Recognition: 2 theorem links
· Lean TheoremTrust Me, I'm a Doctor?
Pith reviewed 2026-05-12 01:56 UTC · model grok-4.3
The pith
Combined randomized and observational data yield sharp bounds on how many physicians outperform the trial's best fixed treatment.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We define a gain score that formalizes the comparison between a physician's personal treatment strategy and the strategy of always choosing the treatment that performed better on average in the randomized trial. Using outcomes observed under treatment, control, and usual care in a nested design, we derive sharp bounds on the proportion of physicians whose personal strategies perform at least as well as, or better than, the trial's better-performing treatment.
What carries the argument
The gain score, which compares each physician's observed outcomes to those expected from always selecting the trial's better treatment, together with the sharp bounds on the fraction of physicians for whom this score is nonnegative.
If this is right
- The data can place both lower and upper limits on the share of physicians who match or exceed the trial recommendation.
- When the lower bound is close to zero, the observed data supply little support for preferring physician discretion over the trial result.
- When the lower bound is high, the data are consistent with a substantial group of physicians doing better than the trial average.
- The bounds are sharp, meaning they cannot be tightened further without additional assumptions or data.
Where Pith is reading between the lines
- The same bounding approach could apply to other decision-makers whose choices are observed alongside trial data, such as teachers or managers.
- Simulations with known physician strategies could check how often the bounds correctly contain the true proportion in finite samples.
- Extending the gain score to account for patient covariates might narrow the bounds when treatment effects vary systematically.
- The framework highlights a general tension between average-effect evidence and individualized practice that appears in many fields beyond medicine.
Load-bearing premise
A randomized trial is nested inside an observational cohort from the same target population so that outcomes can be seen under treatment, control, and usual care.
What would settle it
Collect a new dataset in which each physician's actual long-run success rate is measured directly and compare that empirical proportion to the numerical bounds produced by the method; values outside the interval would contradict the claim that the bounds are sharp under the stated assumptions.
read the original abstract
Clinical trials usually target average treatment effects, but treatment decisions are made for individuals. This tension motivates a common criticism of evidence-based medicine: a treatment that is beneficial on average may be inappropriate for a particular patient, and skilled physicians may outperform rigid adherence to the strategy that performed best in a randomized trial. We consider how randomized and observational data from the same target population can be used to assess that possibility. Specifically, we study settings in which a randomized trial is nested within an observational cohort, so that outcomes are observed under treatment, control, and usual care. We ask what the observed data can reveal about how often physicians outperform the strategy suggested by the trial. We define a gain score to formalize this comparison and derive sharp bounds on the proportion of physicians whose personal strategies perform at least as well as, or better than, always choosing the better performing treatment from the trial. These results shed light on when clinical data support relying on physician discretion over the trial-average recommendation and when stronger justification is required.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper addresses the tension between average treatment effects from clinical trials and individual physician decisions. In settings where a randomized trial is nested within an observational cohort, allowing observation of outcomes under treatment, control, and usual care, the authors define a gain score to compare physicians' strategies to the better-performing trial treatment. They derive sharp bounds on the proportion of physicians whose personal strategies perform at least as well as or better than always choosing the better trial arm.
Significance. If the derived bounds are sharp and the identification is correct, this provides a valuable partial identification framework for assessing when clinical data support physician discretion over trial recommendations. The nested design is cleverly used to obtain the three necessary marginal outcome distributions. This could have implications for evidence-based medicine debates. The approach avoids parametric assumptions by focusing on sharp bounds.
major comments (1)
- Abstract: The claim that sharp bounds are derived from the observed data structure is central, but the description leaves the exact identification assumptions and proof strategy implicit; without explicit verification that the bounds are sharp given only the three marginal distributions (and no additional restrictions), it is difficult to assess whether the result holds under the stated nested design alone.
minor comments (2)
- The gain score definition would benefit from an intuitive example or numerical illustration early in the text to clarify how it formalizes the comparison between physician strategies and the trial's better arm.
- Consider adding a brief sensitivity discussion on how the bounds change if the nested design assumption (randomized trial within the same target population) is mildly violated.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and for recommending minor revision. The single major comment is addressed below.
read point-by-point responses
-
Referee: Abstract: The claim that sharp bounds are derived from the observed data structure is central, but the description leaves the exact identification assumptions and proof strategy implicit; without explicit verification that the bounds are sharp given only the three marginal distributions (and no additional restrictions), it is difficult to assess whether the result holds under the stated nested design alone.
Authors: We agree that the abstract would benefit from greater explicitness on this point. The nested design directly supplies the three marginal outcome distributions (under treatment, under control, and under usual care). The sharp bounds on the proportion of physicians whose strategies achieve a gain score at least as high as the better trial arm are obtained by optimizing over all joints consistent with these marginals; sharpness follows from the existence of extremal joints that attain the bound values while respecting the observed marginals and the nested sampling structure, without parametric restrictions or further assumptions. The full identification argument and proof of sharpness appear in the main text and appendix. We will revise the abstract to state explicitly that the bounds are sharp given only the three observed marginal distributions from the nested design. revision: yes
Circularity Check
No significant circularity; bounds derived from nested data structure
full rationale
The paper defines a gain score formalizing physician performance relative to the trial's better arm and derives sharp bounds on the proportion of physicians meeting or exceeding it. This uses the explicit nested design (randomized trial inside observational cohort) to obtain the three marginal outcome distributions under treatment, control, and usual care. The partial-identification argument relies directly on these observed distributions as inputs; no step reduces the target quantity to a fitted parameter, self-referential equation, or self-citation chain. The abstract and skeptic analysis confirm the bounds are presented as sharp given that data structure, with no internal redefinition or smuggling of assumptions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The randomized trial is nested within an observational cohort from the same target population, with outcomes observed under treatment, control, and usual care.
invented entities (1)
-
gain score
no independent evidence
Reference graph
Works this paper leans on
-
[1]
American Journal of Epidemiology , volume=
Perspective on ‘harm’ in personalized medicine , author=. American Journal of Epidemiology , volume=. 2025 , publisher=
work page 2025
-
[2]
Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=
Covariate-assisted bounds on causal effects with instrumental variables , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=. 2025 , publisher=
work page 2025
-
[3]
American Journal of Epidemiology , year=
Perspective on ‘Harm’ in Personalized Medicine--An Alternative Perspective , author=. American Journal of Epidemiology , year=
-
[4]
American Journal of Epidemiology , volume=
Rejoinder to ``Perspectives on `harm' in personalized medicine--an alternative perspective'' , author=. American Journal of Epidemiology , volume=. 2025 , publisher=
work page 2025
-
[5]
Journal of Causal Inference , volume=
Personalized decision making--A conceptual introduction , author=. Journal of Causal Inference , volume=. 2023 , publisher=
work page 2023
-
[6]
arXiv preprint arXiv:2405.08727 , year=
Intervention effects based on potential benefit , author=. arXiv preprint arXiv:2405.08727 , year=
-
[7]
Advances in Neural Information Processing Systems , volume=
Counterfactual harm , author=. Advances in Neural Information Processing Systems , volume=
-
[8]
Advances in Neural Information Processing Systems , volume=
What's the harm? sharp bounds on the fraction negatively affected by treatment , author=. Advances in Neural Information Processing Systems , volume=
-
[9]
Journal of the American Statistical Association , volume=
Some probability paradoxes in choice from among random alternatives , author=. Journal of the American Statistical Association , volume=. 1972 , publisher=
work page 1972
-
[10]
Undergraduate Review , volume=
The Mystery of the Non-Transitive Grime Dice , author=. Undergraduate Review , volume=
-
[11]
The College Mathematics Journal , volume=
The bizarre world of nontransitive dice: games for two or more players , author=. The College Mathematics Journal , volume=. 2017 , publisher=
work page 2017
-
[12]
Annals of Mathematics and Artificial Intelligence , volume=
Probabilities of causation: Bounds and identification , author=. Annals of Mathematics and Artificial Intelligence , volume=. 2000 , publisher=
work page 2000
-
[13]
arXiv preprint arXiv:2301.11976 , year=
Personalised decision-making without counterfactuals , author=. arXiv preprint arXiv:2301.11976 , year=
-
[14]
Journal of the American statistical Association , volume=
Causal inference without counterfactuals , author=. Journal of the American statistical Association , volume=. 2000 , publisher=
work page 2000
- [15]
-
[16]
Artificial intelligence-powered chatbots in search engines: a cross-sectional study on the quality and risks of drug information for patients , author=. BMJ Quality & Safety , year=
-
[17]
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence,
Quantifying harm , author=. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence,. 2023 , month =. doi:10.24963/ijcai.2023/41 , url =
-
[18]
A causal analysis of harm , author=. Minds and Machines , volume=. 2024 , publisher=
work page 2024
-
[19]
Journal of Business & Economic Statistics , volume=
Generalizing the Results from Social Experiments: Theory and Evidence from India , author=. Journal of Business & Economic Statistics , volume=. 2024 , publisher=
work page 2024
-
[20]
Sharp bounds on the distribution of treatment effects and their statistical inference , author=. Econometric Theory , volume=. 2010 , publisher=
work page 2010
-
[21]
Journal of the American Statistical Association , volume=
Decomposing treatment effect variation , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=
work page 2019
-
[22]
Implementing the WHO Stop TB Strategy: a handbook for national TB control programmes , author=. 2008 , publisher=
work page 2008
- [23]
-
[24]
arXiv preprint arXiv:2509.20506 , year=
Identification and Estimation of Joint Potential Outcome Distributions from a Single Study , author=. arXiv preprint arXiv:2509.20506 , year=
- [25]
-
[26]
Theory of games and economic behavior, 2nd rev , author=. 1947 , publisher=
work page 1947
-
[27]
Foundations of rational choice under risk , author=. OUP Catalogue , year=
-
[28]
Social science & medicine , volume=
Understanding and misunderstanding randomized controlled trials , author=. Social science & medicine , volume=. 2018 , publisher=
work page 2018
-
[29]
Great harms from small benefits grow: how death can be outweighed by headaches , author=. Analysis , volume=. 1998 , publisher=
work page 1998
-
[30]
30-day mortality after systemic anticancer treatment for breast and lung cancer in England: a population-based, observational study , author=. The Lancet Oncology , volume=. 2016 , publisher=
work page 2016
-
[31]
arXiv preprint arXiv:2110.10961 , year=
Individualized decision-making under partial identification: Three perspectives, two optimality results, and one paradox , author=. arXiv preprint arXiv:2110.10961 , year=
-
[32]
Advances in neural information processing systems , volume=
Reliable decision support using counterfactual models , author=. Advances in neural information processing systems , volume=
-
[33]
Optimal regimes for algorithm-assisted human decision-making , author=. Biometrika , volume=. 2024 , publisher=
work page 2024
-
[34]
arXiv preprint arXiv:2502.10049 , year=
The Probability of Tiered Benefit: Partial Identification with Robust and Stable Inference , author=. arXiv preprint arXiv:2502.10049 , year=
-
[35]
arXiv preprint arXiv:2411.01234 , year=
Identifying and bounding the probability of necessity for causes of effects with ordinal outcomes , author=. arXiv preprint arXiv:2411.01234 , year=
-
[36]
Center for the Statistics and the Social Sciences, University of Washington Series
Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality , author=. Center for the Statistics and the Social Sciences, University of Washington Series. Working Paper , volume=. 2013 , publisher=
work page 2013
-
[37]
Fr. G. Fundamenta mathematicae , volume=. 1935 , publisher=
work page 1935
-
[38]
R. Fr. Advances in Probability Distributions with Given Marginals: beyond the copulas , pages=. 1991 , publisher=
work page 1991
-
[39]
Incremental utility elicitation with the minimax regret decision criterion , author=. Ijcai , volume=
-
[40]
Visual exploration and incremental utility elicitation , author=. AAAI/IAAI , pages=
- [41]
-
[42]
Socio-economic planning sciences , volume=
Social preferences for health states: an empirical evaluation of three measurement techniques , author=. Socio-economic planning sciences , volume=. 1976 , publisher=
work page 1976
-
[43]
Methods for the economic evaluation of health care programmes , author=. 2015 , publisher=
work page 2015
-
[44]
Journal of clinical epidemiology , volume=
Deriving a preference-based single index from the UK SF-36 Health Survey , author=. Journal of clinical epidemiology , volume=. 1998 , publisher=
work page 1998
-
[45]
Mathematics and Computers in Simulation , volume=
Nontransitivity of tuples of random variables with polynomial density and its effects in Bayesian models , author=. Mathematics and Computers in Simulation , volume=. 2022 , publisher=
work page 2022
-
[46]
arXiv preprint arXiv:2407.14635 , year=
Predicting the Distribution of Treatment Effects: A Covariate-Adjustment Approach , author=. arXiv preprint arXiv:2407.14635 , year=
-
[47]
arXiv preprint arXiv:2311.15878 , year=
Policy learning with distributional welfare , author=. arXiv preprint arXiv:2311.15878 , year=
-
[48]
Journal of the American Statistical Association , volume=
Policy learning with asymmetric counterfactual utilities , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=
work page 2024
-
[49]
Population intervention models in causal inference , author=. Biometrika , volume=. 2008 , publisher=
work page 2008
-
[50]
Journal of the American Statistical Association , pages=
Improved bounds and inference on optimal regimes , author=. Journal of the American Statistical Association , pages=. 2025 , publisher=
work page 2025
-
[51]
Neurosurgical Clipping Versus Endovascular Coiling of Patients With Ruptured Intracranial Aneurysms , author =. Stroke , volume =. 2003 , doi =
work page 2003
-
[52]
American Journal of Epidemiology , pages=
Counterfactual Harm: A Counter-argument , author=. American Journal of Epidemiology , pages=. 2026 , publisher=
work page 2026
-
[53]
American Journal of Epidemiology , volume =
Combined Analysis of Women's Health Initiative Observational and Clinical Trial Data on Postmenopausal Hormone Treatment and Cardiovascular Disease , author =. American Journal of Epidemiology , volume =. 2006 , doi =
work page 2006
- [54]
-
[55]
Mathematics of Operations Research , volume=
Extreme points of moment sets , author=. Mathematics of Operations Research , volume=. 1988 , publisher=
work page 1988
-
[56]
Archiv der Mathematik , volume=
Minimalstellen von funktionen und extremalpunkte , author=. Archiv der Mathematik , volume=. 1958 , publisher=
work page 1958
-
[57]
Richter, Hans , journal=. Parameterfreie absch. 1957 , publisher=
work page 1957
-
[58]
European journal of epidemiology , volume=
Prospective benchmarking of an observational analysis in the SWEDEHEART registry against the REDUCE-AMI randomized trial , author=. European journal of epidemiology , volume=. 2024 , publisher=
work page 2024
-
[59]
New England Journal of Medicine , volume =
Transcatheter Aortic-Valve Replacement with a Balloon-Expandable Valve in Low-Risk Patients , author =. New England Journal of Medicine , volume =. 2019 , doi =
work page 2019
-
[60]
Thrombus Aspiration during ST-Segment Elevation Myocardial Infarction , journal =
Fr. Thrombus Aspiration during ST-Segment Elevation Myocardial Infarction , journal =. 2013 , volume =
work page 2013
-
[61]
Bivalirudin versus Heparin Monotherapy in Myocardial Infarction , journal =
Erlinge, David and Omerovic, Elmir and Fr. Bivalirudin versus Heparin Monotherapy in Myocardial Infarction , journal =. 2017 , volume =
work page 2017
-
[62]
Russian roulette: the need for stochastic potential outcomes when utilities depend on counterfactuals , author=. Biometrika , volume=. 2025 , publisher=
work page 2025
-
[63]
Quantifying Individual Risk for Binary Outcomes
Quantifying individual risk for binary outcome , author=. arXiv preprint arXiv:2402.10537 , year=
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.