The balloon mean is a computationally tractable robust differentially private mean estimator with theoretical guarantees under heavy-tailed contaminated elliptical models.
hub Canonical reference
Wood, Natalya Pya, and Benjamin Säfken
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A Bayesian hypergraph inference method models EHR multi-disease risk by letting risk factors modulate latent hyperedges (disease subsets) with repulsion priors and structured variational inference for uncertainty and scalability.
APIC applies Neural Processes in a two-branch latent model to amortize Kennedy-O'Hagan-style calibration, separating instance-specific parameters from shared structural discrepancies for fast inference on new realizations.
A semiparametric framework clusters high-dimensional elliptical data with heavy tails via cluster-specific centers, a common unknown radial generator, and a shared sparse precision matrix, with GEM algorithm and high-dimensional consistency guarantees.
A GenAI-based method extracts representations from unstructured data and uses a neural network to fit marginal structural models that recover causal effects of treatment feature sequences including their positions.
HPPCA is a hierarchical extension of PPCA that uses Gaussian processes to model within-subject dynamics in longitudinal data, outperforming standard PPCA and functional PCA in imputation under missingness and misspecification.
BUGS embeds univariate marginal guidance into a regularized horseshoe prior to induce adaptive shrinkage, supplies theoretical contraction guarantees, and offers an active-set MCMC approximation that scales to p=1,000,000 while improving false-discovery control.
MSFAST extends the FAST FPCA method to multivariate sparse data via Bayesian modeling with orthonormal splines, standardization, Procrustes alignment, and efficient computation, yielding valid inferences especially in low signal-to-noise settings.
Characterizes the distributional mean-field limit of co-evolving latent space networks with feedback, including empirical measures and graphon convergence, via a conditional propagation of chaos result.
A structured secant quasi-Newton method (qEFS) for smoothing parameter selection in general smooth models that approximates the Hessian and is easier to implement than exact second-order methods.
Proposes and analyzes a homogeneity test using squared L2 distance of empirical EOT maps to uniform-on-ball reference, with FCLT, Gaussian quadratic null limit, consistency, local power, and weighted multiplier bootstrap.
Joint calibrated LTMLE integrates LTMLE with joint calibrated weights to improve finite-sample efficiency and robustness to misspecification for per-protocol effect estimation in target trial emulation.
SVI-Bench is a 35K-hour sports video benchmark with 9 tasks across four cognitive pillars that reveals multimodal models drop from ~73% on action QA to 5% on agentic evidence-gathering tasks.
An intrinsic spherical kernel ridge regression framework is introduced for non-linear responses on spheres, reducing infinite-dimensional estimation to finite via the representer theorem with convergence rates shown.
Derives new analytical sample size and power formulas for marginal hazard ratios in causal inference with time-to-event outcomes, applicable to randomized trials and observational studies via IPW estimators.
NBFFG combines a closed-form backward filter from a linear-Gaussian proxy process with a learned neural residual to enable efficient variational inference and unbiased pathwise subsampling for nonlinear diffusions on trees.
The paper introduces a penalized distributed lag non-linear Lee-Carter framework that adds temperature and influenza effects, negative binomial overdispersion, SARIMA dynamics, and copula dependence for improved regional weekly mortality forecasts on French data 1990-2019.
Soccer pitch control model with heterogeneous player top speeds finds role-dependent positive correlations and a logarithmic effect from a stamina factor that adjusts speed.
Cost-aware full-model fine-tuning with joint entropy coding and structured sparsity prior improves rate-distortion performance of neural CSI compression under distribution shifts.
Constrained weighted Bayesian bootstrap extends weighted Bayesian bootstrap to constrained posteriors with asymptotics matching restricted MLE and is demonstrated on option pricing.
A weighted K-means plus decision-tree pipeline learns multi-action policies from observational data and is applied to HCV treatment choices for HIV co-infected patients, finding a high-clearance subgroup and potential cost savings of CAN$3.6-4.9 million.
Develops SICS and RCRS screening methods for consistent selection of sparse active predictors and change points in high-dimensional structural break predictive regressions that may involve stationary or cointegrated series.
Football fever in spectators follows a V-shaped time course captured as a latent process from heart rate and stress data via time-dependent structural equation modeling.
Social media research yields inconclusive causal findings due to system complexity, and progress requires mechanistic explanations that integrate observational and experimental approaches while recognizing their shared limitations.
citing papers explorer
-
Computationally tractable robust differentially private mean estimation
The balloon mean is a computationally tractable robust differentially private mean estimator with theoretical guarantees under heavy-tailed contaminated elliptical models.
-
Disentangling Latent Risk Pathways via Bayesian Hypergraph Inference
A Bayesian hypergraph inference method models EHR multi-disease risk by letting risk factors modulate latent hyperedges (disease subsets) with repulsion priors and structured variational inference for uncertainty and scalability.
-
APIC: Amortized Physics-Informed Calibration using Neural Processes
APIC applies Neural Processes in a two-branch latent model to amortize Kennedy-O'Hagan-style calibration, separating instance-specific parameters from shared structural discrepancies for fast inference on new realizations.
-
Semiparametric Elliptical Mixture Clustering for High-Dimensional Data
A semiparametric framework clusters high-dimensional elliptical data with heavy tails via cluster-specific centers, a common unknown radial generator, and a shared sparse precision matrix, with GEM algorithm and high-dimensional consistency guarantees.
-
GenAI Powered Dynamic Causal Inference with Unstructured Data
A GenAI-based method extracts representations from unstructured data and uses a neural network to fit marginal structural models that recover causal effects of treatment feature sequences including their positions.
-
Hierarchical Probabilistic Principal Component Analysis of Longitudinal Data
HPPCA is a hierarchical extension of PPCA that uses Gaussian processes to model within-subject dynamics in longitudinal data, outperforming standard PPCA and functional PCA in imputation under missingness and misspecification.
-
Bayesian Global-Local Shrinkage with Univariate Guidance for Ultra-High-Dimensional Regression
BUGS embeds univariate marginal guidance into a regularized horseshoe prior to induce adaptive shrinkage, supplies theoretical contraction guarantees, and offers an active-set MCMC approximation that scales to p=1,000,000 while improving false-discovery control.
-
Bayesian Multivariate Sparse Functional Principal Components Analysis
MSFAST extends the FAST FPCA method to multivariate sparse data via Bayesian modeling with orthonormal splines, standardization, Procrustes alignment, and efficient computation, yielding valid inferences especially in low signal-to-noise settings.
-
Mean-Field Analysis of Latent Variable Process Models on Dynamically Evolving Graphs with Feedback Effects
Characterizes the distributional mean-field limit of co-evolving latent space networks with feedback, including empirical measures and graphon convergence, via a conditional propagation of chaos result.
-
Structured Secant Methods to Select Smoothing Parameters For General Smooth Models
A structured secant quasi-Newton method (qEFS) for smoothing parameter selection in general smooth models that approximates the Hessian and is easier to implement than exact second-order methods.
-
Two-Sample Homogeneity Test via Entropic Optimal Transport
Proposes and analyzes a homogeneity test using squared L2 distance of empirical EOT maps to uniform-on-ball reference, with FCLT, Gaussian quadratic null limit, consistency, local power, and weighted multiplier bootstrap.
-
Improving Longitudinal Targeted Maximum Likelihood Estimation in Target Trial Emulation using Joint Calibrated Weights
Joint calibrated LTMLE integrates LTMLE with joint calibrated weights to improve finite-sample efficiency and robustness to misspecification for per-protocol effect estimation in target trial emulation.
-
SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence
SVI-Bench is a 35K-hour sports video benchmark with 9 tasks across four cognitive pillars that reveals multimodal models drop from ~73% on action QA to 5% on agentic evidence-gathering tasks.
-
Infinite-Dimensional Spherical Kernel ridge Regression
An intrinsic spherical kernel ridge regression framework is introduced for non-linear responses on spheres, reducing infinite-dimensional estimation to finite via the representer theorem with convergence rates shown.
-
Sample size and power calculations for causal inference with time-to-event outcomes
Derives new analytical sample size and power formulas for marginal hazard ratios in causal inference with time-to-event outcomes, applicable to randomized trials and observational studies via IPW estimators.
-
Neural Backward Filtering Forward Guiding
NBFFG combines a closed-form backward filter from a linear-Gaussian proxy process with a learned neural residual to enable efficient variational inference and unbiased pathwise subsampling for nonlinear diffusions on trees.
-
A penalized distributed lag non-linear Lee-Carter framework for regional weekly mortality forecasting
The paper introduces a penalized distributed lag non-linear Lee-Carter framework that adds temperature and influenza effects, negative binomial overdispersion, SARIMA dynamics, and copula dependence for improved regional weekly mortality forecasts on French data 1990-2019.
-
Interplay between pitch control and top speed in soccer: The stamina factor
Soccer pitch control model with heterogeneous player top speeds finds role-dependent positive correlations and a logarithmic effect from a stamina factor that adjusts speed.
-
Neural CSI Compression Fine-Tuning: Taming the Communication Cost of Model Updates
Cost-aware full-model fine-tuning with joint entropy coding and structured sparsity prior improves rate-distortion performance of neural CSI compression under distribution shifts.
-
Constrained Weighted Bayesian Bootstrap
Constrained weighted Bayesian bootstrap extends weighted Bayesian bootstrap to constrained posteriors with asymptotics matching restricted MLE and is demonstrated on option pricing.
-
Policy Learning with Observational Data: The Case of Hepatitis C Treatment for HIV/HCV Co-Infected Patients
A weighted K-means plus decision-tree pipeline learns multi-action policies from observational data and is applied to HCV treatment choices for HIV co-infected patients, finding a high-clearance subgroup and potential cost savings of CAN$3.6-4.9 million.
-
Feature Screening for High-Dimensional Structural Break Predictive Regression
Develops SICS and RCRS screening methods for consistent selection of sparse active predictors and change points in high-dimensional structural break predictive regressions that may involve stationary or cointegrated series.
-
Time-dependent structural equation modeling of fans' football fever using activity tracking data during the 2025 DFB Cup final
Football fever in spectators follows a V-shaped time course captured as a latent process from heart rate and stress data via time-dependent structural equation modeling.
-
Moving towards informative and actionable social media research
Social media research yields inconclusive causal findings due to system complexity, and progress requires mechanistic explanations that integrate observational and experimental approaches while recognizing their shared limitations.
-
Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models
Large vision-language models applied to multi-scale remote sensing imagery can generate recommendations on built environment design, constructability, land use, and risks for smart city decision-making.
-
Smart Ensemble Learning Framework for Predicting Groundwater Heavy Metal Pollution
Ensemble learning with Gaussian copula transformation predicts groundwater heavy metal pollution index with high accuracy (R²=0.96) while identifying key contaminants via clustering.
- KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis