Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
hub Mixed citations
PPI++: Efficient Prediction-Powered Inference
Mixed citation behavior. Most common role is method (50%).
abstract
We present PPI++: a computationally lightweight methodology for estimation and inference based on a small labeled dataset and a typically much larger dataset of machine-learning predictions. The methods automatically adapt to the quality of available predictions, yielding easy-to-compute confidence sets -- for parameters of any dimensionality -- that always improve on classical intervals using only the labeled data. PPI++ builds on prediction-powered inference (PPI), which targets the same problem setting, improving its computational and statistical efficiency. Real and synthetic experiments demonstrate the benefits of the proposed adaptations.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Introduces a parametric reservation-index policy with GMM estimation and UCB exploration for contextual LLM cascading under output-mediated feedback, claiming dimension-dependent square-root regret.
Multi-task PPI framework uses cross-task recalibration to improve inference power across related tasks, with a proof that gains require nonlinear proxy-ground-truth structure, shown on synthetic data and a 2024 election LM audit case study.
PUMA uses model averaging to jointly handle uncertainties from model misspecification, tuning, and ML choice, delivering asymptotic in-sample and out-of-sample prediction optimality plus estimation consistency.
An MOE-powered PPI framework adaptively blends multiple predictors to achieve minimal variance and a best-expert guarantee for semi-supervised mean estimation, linear regression, quantile estimation, and M-estimation, supported by non-asymptotic coverage bounds.
A coupled-label bootstrap provides valid inference for OLS regressions that use AI/ML-generated binary labels despite misclassification errors, unlike standard fixed-label bootstraps.
Post-hoc calibration of miscalibrated black-box predictions on a labeled sample improves efficiency of prediction-powered inference for semisupervised mean estimation.
The MLA-UCB algorithm uses ML-generated surrogate rewards from auxiliary data to provably lower cumulative regret in multi-armed bandits, achieving asymptotic optimality under joint Gaussian assumptions without requiring knowledge of the true-surrogate covariance.
Active inference adapts label collection via ML uncertainty to deliver valid statistical inference with substantially fewer samples than standard non-adaptive methods across any data distribution.
X4Val learns transferable neural predictors from non-paired multi-domain data and incorporates them into control-variates estimators to reduce variance in real-world robotic policy evaluation by up to 38.4%.
Introduces convolution smoothing of the check-loss for prediction-powered quantile regression, derives asymptotics under misspecification, and proposes an ensemble estimator.
OPAL learns optimal smooth labeling policies from ML uncertainty scores to enable low-variance prediction-assisted inference with finite-sample coverage guarantees.
Active inference framework for U-statistics using augmented IPW to optimize label queries and minimize variance under budget constraints.
Doubly robust estimators that incorporate low-rank predictions enable valid finite-sample confidence intervals for best-model identification under adaptive sampling and without-replacement example selection in LLM evaluation.
Rectified AI priors, obtained by correcting AI-induced data laws before embedding them in techniques like Dirichlet process priors, reduce bias, improve credible interval coverage, and boost performance in tasks like skin disease classification.
Empirical Bayes rebiasing learns the bias distribution from paired noisy estimates to produce shorter calibrated intervals than full debiasing while maintaining coverage.
Bias-corrected LLM-as-a-Judge estimators can reverse true model orderings under shared calibration, and the paper supplies judge quality J and cross-model instability ΔJ as practical diagnostics for when such estimates are unreliable.
A meta-analytic framework estimates the resilience probability of a surrogate marker to the surrogate paradox in a new study by modeling deviations from functional relationships observed in completed trials.
DOPE is a Neyman-orthogonal one-step semiparametric estimator that removes first-order bias in functional estimates from neural operators by learning weights via Riesz regression.
A new e-statistic enables anytime-valid sequential testing by betting on predictions from unlabeled data, with non-trivial power for binary outcomes even under inaccurate predictions and label or concept shift.
A framework models proxy-primary outcome discrepancies as random effects at the parameter level, estimated from aggregated historical observations to calibrate inferences under distribution shifts.
Non-asymptotic analysis of prediction-powered mean estimation shows that no-regret learning for query probabilities converges to the maximum allowed constant value, independent of covariates.
Introduces D2S3 semiparametric framework that extends AIPW estimators to semi-supervised settings with MAR labeling, distribution shift, and decaying overlap, supplying corrected asymptotic rates instead of root-n convergence.
Kernel ridge regression combined with mRMR feature selection improves prediction of full benchmark scores from question subsets over existing efficient benchmarking techniques.
citing papers explorer
-
Prediction-Powered Linear Regression: A Balance Between Interpretation and Prediction
PUMA uses model averaging to jointly handle uncertainties from model misspecification, tuning, and ML choice, delivering asymptotic in-sample and out-of-sample prediction optimality plus estimation consistency.
-
Calibeating Prediction-Powered Inference
Post-hoc calibration of miscalibrated black-box predictions on a labeled sample improves efficiency of prediction-powered inference for semisupervised mean estimation.
-
Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization
Doubly robust estimators that incorporate low-rank predictions enable valid finite-sample confidence intervals for best-model identification under adaptive sampling and without-replacement example selection in LLM evaluation.