hub Mixed citations

PPI++: Efficient Prediction-Powered Inference

Anastasios N. Angelopoulos, John C. Duchi, Tijana Zrnic · 2023 · stat.ML · arXiv 2311.01453

Mixed citation behavior. Most common role is method (50%).

22 Pith papers citing it

Method 50% of classified citations

open full Pith review browse 22 citing papers arXiv PDF

abstract

We present PPI++: a computationally lightweight methodology for estimation and inference based on a small labeled dataset and a typically much larger dataset of machine-learning predictions. The methods automatically adapt to the quality of available predictions, yielding easy-to-compute confidence sets -- for parameters of any dimensionality -- that always improve on classical intervals using only the labeled data. PPI++ builds on prediction-powered inference (PPI), which targets the same problem setting, improving its computational and statistical efficiency. Real and synthetic experiments demonstrate the benefits of the proposed adaptations.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 3 background 2 dataset 1

citation-polarity summary

use method 3 background 2 use dataset 1

representative citing papers

The Statistical Cost of Adaptation in Multi-Source Transfer Learning

math.ST · 2026-05-10 · unverdicted · novelty 8.0

Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.

Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research

stat.ML · 2026-05-28 · unverdicted · novelty 7.0

Multi-task PPI framework uses cross-task recalibration to improve inference power across related tasks, with a proof that gains require nonlinear proxy-ground-truth structure, shown on synthetic data and a 2024 election LM audit case study.

Prediction-Powered Linear Regression: A Balance Between Interpretation and Prediction

stat.ME · 2026-05-09 · unverdicted · novelty 7.0

PUMA uses model averaging to jointly handle uncertainties from model misspecification, tuning, and ML choice, delivering asymptotic in-sample and out-of-sample prediction optimality plus estimation consistency.

Prediction-powered Inference by Mixture of Experts

stat.ML · 2026-04-30 · unverdicted · novelty 7.0

An MOE-powered PPI framework adaptively blends multiple predictors to achieve minimal variance and a best-expert guarantee for semi-supervised mean estimation, linear regression, quantile estimation, and M-estimation, supported by non-asymptotic coverage bounds.

Bootstrapping with AI/ML-generated labels

econ.EM · 2026-04-26 · unverdicted · novelty 7.0

A coupled-label bootstrap provides valid inference for OLS regressions that use AI/ML-generated binary labels despite misclassification errors, unlike standard fixed-label bootstraps.

Calibeating Prediction-Powered Inference

stat.ML · 2026-04-23 · unverdicted · novelty 7.0

Post-hoc calibration of miscalibrated black-box predictions on a labeled sample improves efficiency of prediction-powered inference for semisupervised mean estimation.

Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards

math.ST · 2025-06-20 · unverdicted · novelty 7.0

The MLA-UCB algorithm uses ML-generated surrogate rewards from auxiliary data to provably lower cumulative regret in multi-armed bandits, achieving asymptotic optimality under joint Gaussian assumptions without requiring knowledge of the true-surrogate covariance.

Active Statistical Inference

stat.ML · 2024-03-05 · unverdicted · novelty 7.0

Active inference adapts label collection via ML uncertainty to deliver valid statistical inference with substantially fewer samples than standard non-adaptive methods across any data distribution.

Learning U-Statistics with Active Inference

stat.ML · 2026-05-12 · unverdicted · novelty 6.0

Active inference framework for U-statistics using augmented IPW to optimize label queries and minimize variance under budget constraints.

Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Doubly robust estimators that incorporate low-rank predictions enable valid finite-sample confidence intervals for best-model identification under adaptive sampling and without-replacement example selection in LLM evaluation.

Supercharging Bayesian Inference with Reliable AI-Informed Priors

stat.ML · 2026-05-11 · unverdicted · novelty 6.0

Rectified AI priors, obtained by correcting AI-induced data laws before embedding them in techniques like Dirichlet process priors, reduce bias, improve credible interval coverage, and boost performance in tasks like skin disease classification.

Empirical Bayes Rebiasing

stat.ME · 2026-05-08 · unverdicted · novelty 6.0

Empirical Bayes rebiasing learns the bias distribution from paired noisy estimates to produce shorter calibrated intervals than full debiasing while maintaining coverage.

Bias and Uncertainty in LLM-as-a-Judge Estimation

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Bias-corrected LLM-as-a-Judge estimators can reverse true model orderings under shared calibration, and the paper supplies judge quality J and cross-model instability ΔJ as practical diagnostics for when such estimates are unreliable.

A Functional-Class Meta-Analytic Framework for Quantifying Surrogate Resilience

stat.ME · 2026-04-22 · unverdicted · novelty 6.0

A meta-analytic framework estimates the resilience probability of a surrogate marker to the surrogate paradox in a new study by modeling deviations from functional relationships observed in completed trials.

Debiased neural operators for estimating functionals

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

DOPE is a Neyman-orthogonal one-step semiparametric estimator that removes first-order bias in functional estimates from neural operators by learning weights via Riesz regression.

Semi-Supervised Hypothesis Testing by Betting on Predictions

cs.LG · 2026-05-27 · unverdicted · novelty 5.0

A new e-statistic enables anytime-valid sequential testing by betting on predictions from unlabeled data, with non-trivial power for binary outcomes even under inaccurate predictions and label or concept shift.

Estimate Level Adjustment For Inference With Proxies Under Random Distribution Shifts

stat.ME · 2026-05-07 · unverdicted · novelty 5.0

A framework models proxy-primary outcome discrepancies as random effects at the parameter level, estimated from aggregated historical observations to calibrate inferences under distribution shifts.

Revisiting Active Sequential Prediction-Powered Mean Estimation

stat.ML · 2026-04-20 · unverdicted · novelty 5.0

Non-asymptotic analysis of prediction-powered mean estimation shows that no-regret learning for query probabilities converges to the maximum allowed constant value, independent of covariates.

Semiparametric semi-supervised learning for general targets under distribution shift and decaying overlap

math.ST · 2025-05-09 · unverdicted · novelty 5.0

Introduces D2S3 semiparametric framework that extends AIPW estimators to semi-supervised settings with MAR labeling, distribution shift, and decaying overlap, supplying corrected asymptotic rates instead of root-n convergence.

Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation

cs.AI · 2026-05-29 · unverdicted · novelty 3.0

GLIDE is a Python library that packages multiple PPI estimators and samplers for reliable GenAI evaluation and reports annotation savings in an agentic case study.

High-Dimensional Statistics: Reflections on Progress and Open Problems

math.ST · 2026-05-06

Allocating Human Oversight in AI-Enabled Analytics

cs.LG · 2026-04-14

citing papers explorer

Showing 22 of 22 citing papers.

The Statistical Cost of Adaptation in Multi-Source Transfer Learning math.ST · 2026-05-10 · unverdicted · none · ref 188 · internal anchor
Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research stat.ML · 2026-05-28 · unverdicted · none · ref 1 · internal anchor
Multi-task PPI framework uses cross-task recalibration to improve inference power across related tasks, with a proof that gains require nonlinear proxy-ground-truth structure, shown on synthetic data and a 2024 election LM audit case study.
Prediction-Powered Linear Regression: A Balance Between Interpretation and Prediction stat.ME · 2026-05-09 · unverdicted · none · ref 5 · internal anchor
PUMA uses model averaging to jointly handle uncertainties from model misspecification, tuning, and ML choice, delivering asymptotic in-sample and out-of-sample prediction optimality plus estimation consistency.
Prediction-powered Inference by Mixture of Experts stat.ML · 2026-04-30 · unverdicted · none · ref 2 · internal anchor
An MOE-powered PPI framework adaptively blends multiple predictors to achieve minimal variance and a best-expert guarantee for semi-supervised mean estimation, linear regression, quantile estimation, and M-estimation, supported by non-asymptotic coverage bounds.
Bootstrapping with AI/ML-generated labels econ.EM · 2026-04-26 · unverdicted · none · ref 3 · internal anchor
A coupled-label bootstrap provides valid inference for OLS regressions that use AI/ML-generated binary labels despite misclassification errors, unlike standard fixed-label bootstraps.
Calibeating Prediction-Powered Inference stat.ML · 2026-04-23 · unverdicted · none · ref 1 · internal anchor
Post-hoc calibration of miscalibrated black-box predictions on a labeled sample improves efficiency of prediction-powered inference for semisupervised mean estimation.
Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards math.ST · 2025-06-20 · unverdicted · none · ref 3 · internal anchor
The MLA-UCB algorithm uses ML-generated surrogate rewards from auxiliary data to provably lower cumulative regret in multi-armed bandits, achieving asymptotic optimality under joint Gaussian assumptions without requiring knowledge of the true-surrogate covariance.
Active Statistical Inference stat.ML · 2024-03-05 · unverdicted · none · ref 3 · internal anchor
Active inference adapts label collection via ML uncertainty to deliver valid statistical inference with substantially fewer samples than standard non-adaptive methods across any data distribution.
Learning U-Statistics with Active Inference stat.ML · 2026-05-12 · unverdicted · none · ref 13 · internal anchor
Active inference framework for U-statistics using augmented IPW to optimize label queries and minimize variance under budget constraints.
Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization cs.LG · 2026-05-11 · unverdicted · none · ref 14 · internal anchor
Doubly robust estimators that incorporate low-rank predictions enable valid finite-sample confidence intervals for best-model identification under adaptive sampling and without-replacement example selection in LLM evaluation.
Supercharging Bayesian Inference with Reliable AI-Informed Priors stat.ML · 2026-05-11 · unverdicted · none · ref 1 · internal anchor
Rectified AI priors, obtained by correcting AI-induced data laws before embedding them in techniques like Dirichlet process priors, reduce bias, improve credible interval coverage, and boost performance in tasks like skin disease classification.
Empirical Bayes Rebiasing stat.ME · 2026-05-08 · unverdicted · none · ref 1 · internal anchor
Empirical Bayes rebiasing learns the bias distribution from paired noisy estimates to produce shorter calibrated intervals than full debiasing while maintaining coverage.
Bias and Uncertainty in LLM-as-a-Judge Estimation cs.LG · 2026-05-07 · unverdicted · none · ref 2 · internal anchor
Bias-corrected LLM-as-a-Judge estimators can reverse true model orderings under shared calibration, and the paper supplies judge quality J and cross-model instability ΔJ as practical diagnostics for when such estimates are unreliable.
A Functional-Class Meta-Analytic Framework for Quantifying Surrogate Resilience stat.ME · 2026-04-22 · unverdicted · none · ref 53 · internal anchor
A meta-analytic framework estimates the resilience probability of a surrogate marker to the surrogate paradox in a new study by modeling deviations from functional relationships observed in completed trials.
Debiased neural operators for estimating functionals cs.LG · 2026-04-21 · unverdicted · none · ref 2 · internal anchor
DOPE is a Neyman-orthogonal one-step semiparametric estimator that removes first-order bias in functional estimates from neural operators by learning weights via Riesz regression.
Semi-Supervised Hypothesis Testing by Betting on Predictions cs.LG · 2026-05-27 · unverdicted · none · ref 1 · internal anchor
A new e-statistic enables anytime-valid sequential testing by betting on predictions from unlabeled data, with non-trivial power for binary outcomes even under inaccurate predictions and label or concept shift.
Estimate Level Adjustment For Inference With Proxies Under Random Distribution Shifts stat.ME · 2026-05-07 · unverdicted · none · ref 3 · internal anchor
A framework models proxy-primary outcome discrepancies as random effects at the parameter level, estimated from aggregated historical observations to calibrate inferences under distribution shifts.
Revisiting Active Sequential Prediction-Powered Mean Estimation stat.ML · 2026-04-20 · unverdicted · none · ref 1 · internal anchor
Non-asymptotic analysis of prediction-powered mean estimation shows that no-regret learning for query probabilities converges to the maximum allowed constant value, independent of covariates.
Semiparametric semi-supervised learning for general targets under distribution shift and decaying overlap math.ST · 2025-05-09 · unverdicted · none · ref 1 · internal anchor
Introduces D2S3 semiparametric framework that extends AIPW estimators to semi-supervised settings with MAR labeling, distribution shift, and decaying overlap, supplying corrected asymptotic rates instead of root-n convergence.
Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation cs.AI · 2026-05-29 · unverdicted · none · ref 1 · internal anchor
GLIDE is a Python library that packages multiple PPI estimators and samplers for reliable GenAI evaluation and reports annotation savings in an agentic case study.
High-Dimensional Statistics: Reflections on Progress and Open Problems math.ST · 2026-05-06 · unreviewed · ref 2 · internal anchor
Allocating Human Oversight in AI-Enabled Analytics cs.LG · 2026-04-14 · unreviewed · ref 5 · internal anchor

PPI++: Efficient Prediction-Powered Inference

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer