Local privacy mechanisms preserve rate-double-robustness, enabling unbiased and semiparametrically efficient inference on target parameters indexed linearly by infinite-dimensional and nonlinearly by low-dimensional components from noisy private data.
hub
Regularization and Variable Selection Via the Elastic Net
26 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
fields
stat.ME 9 cs.LG 7 math.ST 2 astro-ph.CO 1 cs.CL 1 cs.CV 1 cs.HC 1 econ.EM 1 stat.AP 1 stat.CO 1years
2026 26representative citing papers
Develops stochastic policies and single-basis-function modification for causal inference on functional treatments, proves asymptotic normality and rate double robustness, and applies to NHANES physical activity and mortality data.
The Paired Swap Permutation Test is an exact non-parametric procedure that compares explanatory power of two dependent predictors via symmetric within-subject swapping for categorical data and ECDF mapping for continuous data.
Recasts covariance shrinkage as risk minimization over stochastic interpolants between distributions, recovering known estimators via scheduling, couplings, and early stopping, and proposing a neural estimator with quadratic risk bounds.
SASA replaces single-vector decoders in SAEs with learned subspaces plus block sparsity and nuclear-norm regularization, proving that a single group becomes the global minimizer once block size meets intrinsic dimension and yielding polynomial rather than exponential sample complexity.
StarTime uses a hierarchical temporal tree to enable sparse or aggregated coefficient selection in high-order autoregressions and mixed-frequency regressions, with new error bounds and simulation improvements over benchmarks.
BUGS embeds univariate marginal guidance into a regularized horseshoe prior to induce adaptive shrinkage, supplies theoretical contraction guarantees, and offers an active-set MCMC approximation that scales to p=1,000,000 while improving false-discovery control.
A framework for online forecast reconciliation is developed via multivariate linear models on graph hierarchies, ridge regression, and recursive least squares, with a demonstration on district heating load data.
Compensator-based estimating equations unify several moment methods for compact-memory multivariate Hawkes processes, delivering uniform high-probability O(sqrt(log T / T)) rates, asymptotic normality, and exact efficiency-loss quantification relative to the likelihood score.
Proposes and analyzes a homogeneity test using squared L2 distance of empirical EOT maps to uniform-on-ball reference, with FCLT, Gaussian quadratic null limit, consistency, local power, and weighted multiplier bootstrap.
Proposes an inferential framework to test differences in categorical Gini correlations for predictor importance in classification, establishing asymptotic normality and consistency while accommodating unequal dimensions and dependence.
A new geographically weighted penalized compositional regression model with pairwise fusion penalty is proposed to handle spatial heterogeneity and compositional covariates, demonstrated on U.S. income and COPD data.
An LLM-based topic modeling method with a custom evaluation framework improves topic interpretability, specificity, and polarity consistency over prior approaches when linking corporate review text to external outcomes such as employee morale.
Proposes PcovRnnp method enabling simultaneous dimension reduction and regularized coefficient estimation via nuclear norm penalty in high-dimensional settings.
An AI recommender system improves Cox Proportional Hazards model performance for predicting patient falls by suggesting 23 feature exclusions, 2 non-linear terms, and 221 interactions, raising C-index from 0.805 to 0.815.
A weighted K-means plus decision-tree pipeline learns multi-action policies from observational data and is applied to HCV treatment choices for HIV co-infected patients, finding a high-clearance subgroup and potential cost savings of CAN$3.6-4.9 million.
Develops SICS and RCRS screening methods for consistent selection of sparse active predictors and change points in high-dimensional structural break predictive regressions that may involve stationary or cointegrated series.
Frozen multimodal embeddings with trait-specific late fusion cut personality prediction MSE by 19% relative to baseline in the 2026 AVI challenge, while cognitive results are attributed to validation shortcuts rather than content-based inference.
GSA-YOLO modifies YOLOv8n with structured sparsity via Group Lasso and Sparse Structure Selection plus Adaptive Knowledge Distillation, reporting 189.62 FPS and mAP50:95 gains of 2.4% and 1.8% on HiXray and PIDray datasets.
Simulations show Ridge, Lasso, and ElasticNet perform similarly for prediction at high sample-to-feature ratios, but Lasso feature selection recall drops to 0.18 under high multicollinearity and low SNR while ElasticNet holds at 0.93.
FDRS combines digit frequency tests, association metrics, entropy, KL divergence, and ML models to assign risk grades to numerical datasets, showing separation between normal and irregular simulated data with high AUC.
Cross-benchmark study tests transfer of AS models between BBOB/CEC suites and robotics/UAV problems, identifying generalization failures in realistic settings.
Ensemble learning with Gaussian copula transformation predicts groundwater heavy metal pollution index with high accuracy (R²=0.96) while identifying key contaminants via clustering.
fastml is an R package that enforces leakage-free preprocessing through guarded resampling and provides a unified interface for safer automated ML including survival analysis.
citing papers explorer
-
Private Rate-Double-Robust Inference
Local privacy mechanisms preserve rate-double-robustness, enabling unbiased and semiparametrically efficient inference on target parameters indexed linearly by infinite-dimensional and nonlinearly by low-dimensional components from noisy private data.
-
Causal Inference for Functional Treatments with Stochastic Policies
Develops stochastic policies and single-basis-function modification for causal inference on functional treatments, proves asymptotic normality and rate double robustness, and applies to NHANES physical activity and mortality data.
-
Exact Comparison of Explanatory Strength of Two Dependent Predictors
The Paired Swap Permutation Test is an exact non-parametric procedure that compares explanatory power of two dependent predictors via symmetric within-subject swapping for categorical data and ECDF mapping for continuous data.
-
Covariance Shrinkage via Stochastic Interpolation
Recasts covariance shrinkage as risk minimization over stochastic interpolants between distributions, recovering known estimators via scheduling, couplings, and early stopping, and proposing a neural estimator with quadratic risk bounds.
-
Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability
SASA replaces single-vector decoders in SAEs with learned subspaces plus block sparsity and nuclear-norm regularization, proving that a single group becomes the global minimizer once block size meets intrinsic dimension and yielding polynomial rather than exponential sample complexity.
-
Sparse Tree-Based Aggregation for Time Series Regressions
StarTime uses a hierarchical temporal tree to enable sparse or aggregated coefficient selection in high-order autoregressions and mixed-frequency regressions, with new error bounds and simulation improvements over benchmarks.
-
Bayesian Global-Local Shrinkage with Univariate Guidance for Ultra-High-Dimensional Regression
BUGS embeds univariate marginal guidance into a regularized horseshoe prior to induce adaptive shrinkage, supplies theoretical contraction guarantees, and offers an active-set MCMC approximation that scales to p=1,000,000 while improving false-discovery control.
-
Online forecast reconciliation using linear models
A framework for online forecast reconciliation is developed via multivariate linear models on graph hierarchies, ridge regression, and recursive least squares, with a demonstration on district heating load data.
-
Optimal Estimating Equations for Compact-Memory Hawkes Processes
Compensator-based estimating equations unify several moment methods for compact-memory multivariate Hawkes processes, delivering uniform high-probability O(sqrt(log T / T)) rates, asymptotic normality, and exact efficiency-loss quantification relative to the likelihood score.
-
Two-Sample Homogeneity Test via Entropic Optimal Transport
Proposes and analyzes a homogeneity test using squared L2 distance of empirical EOT maps to uniform-on-ball reference, with FCLT, Gaussian quadratic null limit, consistency, local power, and weighted multiplier bootstrap.
-
Comparing Two Categorical Gini Correlations with Applications to Classification Problems
Proposes an inferential framework to test differences in categorical Gini correlations for predictor importance in classification, establishing asymptotic normality and consistency while accommodating unequal dimensions and dependence.
-
Linking COPD Prevalence with Income Distribution: A Spatial Heterogeneous Compositional Regression via Geographically Weighted Penalized Approach
A new geographically weighted penalized compositional regression model with pairwise fusion penalty is proposed to handle spatial heterogeneity and compositional covariates, demonstrated on U.S. income and COPD data.
-
Proposing Topic Models and Evaluation Frameworks for Analyzing Associations with External Outcomes: An Application to Leadership Analysis Using Large-Scale Corporate Review Data
An LLM-based topic modeling method with a custom evaluation framework improves topic interpretability, specificity, and polarity consistency over prior approaches when linking corporate review text to external outcomes such as employee morale.
-
Principal Covariate Regression with Nuclear Norm Penalty
Proposes PcovRnnp method enabling simultaneous dimension reduction and regularized coefficient estimation via nuclear norm penalty in high-dimensional settings.
-
Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies
An AI recommender system improves Cox Proportional Hazards model performance for predicting patient falls by suggesting 23 feature exclusions, 2 non-linear terms, and 221 interactions, raising C-index from 0.805 to 0.815.
-
Policy Learning with Observational Data: The Case of Hepatitis C Treatment for HIV/HCV Co-Infected Patients
A weighted K-means plus decision-tree pipeline learns multi-action policies from observational data and is applied to HCV treatment choices for HIV co-infected patients, finding a high-clearance subgroup and potential cost savings of CAN$3.6-4.9 million.
-
Feature Screening for High-Dimensional Structural Break Predictive Regression
Develops SICS and RCRS screening methods for consistent selection of sparse active predictors and change points in high-dimensional structural break predictive regressions that may involve stationary or cointegrated series.
-
Frozen Multimodal Embeddings for AI-Assisted Interview Assessment of Personality and Cognitive Ability
Frozen multimodal embeddings with trait-specific late fusion cut personality prediction MSE by 19% relative to baseline in the 2026 AVI challenge, while cognitive results are attributed to validation shortcuts rather than content-based inference.
-
GSA-YOLO: A High-Efficiency Framework via Structured Sparsity and Adaptive Knowledge Distillation for Real-Time X-ray Security Inspection
GSA-YOLO modifies YOLOv8n with structured sparsity via Group Lasso and Sparse Structure Selection plus Adaptive Knowledge Distillation, reporting 189.62 FPS and mAP50:95 gains of 2.4% and 1.8% on HiXray and PIDray datasets.
-
Choosing the Right Regularizer for Applied ML: Simulation Benchmarks of Popular Scikit-learn Regularization Frameworks
Simulations show Ridge, Lasso, and ElasticNet perform similarly for prediction at high sample-to-feature ratios, but Lasso feature selection recall drops to 0.18 under high multicollinearity and low SNR while ElasticNet holds at 0.93.
-
A machine-learning-assisted progressive digit-randomness screening framework for detecting non-random patterns in raw numerical research data
FDRS combines digit frequency tests, association metrics, entropy, KL divergence, and ML models to assign risk grades to numerical datasets, showing separation between normal and irregular simulated data with high AUC.
-
Evaluating Real-World Generalizability of Algorithm Selection Models
Cross-benchmark study tests transfer of AS models between BBOB/CEC suites and robotics/UAV problems, identifying generalization failures in realistic settings.
-
Smart Ensemble Learning Framework for Predicting Groundwater Heavy Metal Pollution
Ensemble learning with Gaussian copula transformation predicts groundwater heavy metal pollution index with high accuracy (R²=0.96) while identifying key contaminants via clustering.
-
fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R
fastml is an R package that enforces leakage-free preprocessing through guarded resampling and provides a unified interface for safer automated ML including survival analysis.
-
Exploring the Cosmic Dawn through the 21 cm Forest and High-redshift Radio Sources with the SKA
Review of 21 cm forest observations as a probe of the epoch of reionization and fundamental physics using the Square Kilometre Array.
- Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions