pith. sign in

Predictions as surrogates: Revisiting surrogate outcomes in the age of ai.arXiv preprint arXiv:2501.09731

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it
abstract

We establish a formal connection between the decades-old surrogate outcome model in biostatistics and economics and the emerging field of prediction-powered inference (PPI). The connection treats predictions from pre-trained models, prevalent in the age of AI, as cost-effective surrogates for expensive outcomes. Building on the surrogate outcomes literature, we develop recalibrated prediction-powered inference, a more efficient approach to statistical inference than existing PPI proposals. Our method departs from the existing proposals by using flexible machine learning techniques to learn the optimal ``imputed loss'' through a step we call recalibration. Importantly, the method always improves upon the estimator that relies solely on the data with available true outcomes, even when the optimal imputed loss is estimated imperfectly, and it achieves the smallest asymptotic variance among PPI estimators if the estimate is consistent. Computationally, our optimization objective is convex whenever the loss function that defines the target parameter is convex. We further analyze the benefits of recalibration, both theoretically and numerically, in several common scenarios where machine learning predictions systematically deviate from the outcome of interest. We demonstrate significant gains in effective sample size over existing PPI proposals via three applications leveraging state-of-the-art machine learning/AI models.

citation-role summary

method 1

citation-polarity summary

years

2026 5 2025 3

roles

method 1

polarities

use method 1

clear filters

representative citing papers

Calibeating Prediction-Powered Inference

stat.ML · 2026-04-23 · unverdicted · novelty 7.0

Post-hoc calibration of miscalibrated black-box predictions on a labeled sample improves efficiency of prediction-powered inference for semisupervised mean estimation.

Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards

math.ST · 2025-06-20 · unverdicted · novelty 7.0

The MLA-UCB algorithm uses ML-generated surrogate rewards from auxiliary data to provably lower cumulative regret in multi-armed bandits, achieving asymptotic optimality under joint Gaussian assumptions without requiring knowledge of the true-surrogate covariance.

citing papers explorer

Showing 1 of 1 citing paper after filters.