archive
Every paper Pith has read. Search by title, abstract, or pith.
924 papers in stat.ML · page 2
-
LOFT improves orthogonal fine-tuning via task-aware support selection
LOFT: Low-Rank Orthogonal Fine-Tuning via Task-Aware Support Selection
-
Two anchors make reward variance identifiable from preferences
Variance-aware Reward Modeling with Anchor Guidance
-
Kernel eigenvalue decay determines random forest rates
Minimax Rates and Spectral Distillation for Tree Ensembles
-
W-Flow reaches 1.29 FID in one ImageNet generation step
One-Step Generative Modeling via Wasserstein Gradient Flows
-
Sparse Bayesian KANs achieve near-minimax contraction
Posterior Contraction Rates for Sparse Kolmogorov-Arnold Networks in Anisotropic Besov Spaces
-
Active label queries cut U-statistic variance with fixed budget
Learning U-Statistics with Active Inference
-
Noise-subspace estimator matches minimax rate for probabilistic PLS
Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty
-
Composite function stabilizes training of binary-activation networks
A Composite Activation Function for Learning Stable Binary Representations
-
Post-ADC inference restores valid stats after adaptive sampling
Post-ADC Inference: Valid Inference After Active Data Collection
-
Calibration algorithms adapt error bounds to unknown non-stationarity
Adaptive Calibration in Non-Stationary Environments
-
Vector codebook cuts KV cache to 34x compression at 0.95 similarity
FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression
-
Barrier smoothing yields O(K^{-2/3}) stationarity for constrained bilevel opt
A Barrier-Metric First-Order Method for Linearly Constrained Bilevel Optimization
-
PPO reformulated to beat SAC in multi-task RL
TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing
-
Adapter adds closed-form spatial covariance to frozen predictors
Spatial Adapter: Structured Spatial Decomposition and Closed-Form Covariance for Frozen Predictors
-
Causal model recovers recourse effects from observational data
Causal Algorithmic Recourse: Foundations and Methods
-
Decompositions isolate bias pathways in generative models
Causal Bias Detection in Generative Artifical Intelligence
-
Causal paths break down survival disparities over time
Causal Fairness for Survival Analysis
-
Algorithm identifies ε-good subtrees without knowing ε
$\varepsilon$-Good Action Identification in Fixed-Budget Monte Carlo Tree Search
-
Coupled noises lift diversity in diffusion batches at zero added cost
Couple to Control: Joint Initial Noise Design in Diffusion Models
-
Dual form computes influence functions from data size not parameters
Extending Kernel Trick to Influence Functions
-
Stable barcodes track how dependency clusters evolve in dynamic Bayesian networks
A Stable Distance Persistence Homology for Dynamic Bayesian Network Clustering
-
Thompson sampling learns unknown networks while optimizing treatments
Adaptive Policy Learning Under Unknown Network Interference
-
Random spectra match Muon on GPT-2 training
Muon is Not That Special: Random or Inverted Spectra Work Just as Well
-
Kernel makes rotated 3D anisotropy explicit in Gaussian processes
Interpretable Machine Learning for Spatial Science: A Lie-Algebraic Kernel for Rotationally Anisotropic Gaussian Processes
-
VPR with mean-field predictives matches exact posteriors
Variational predictive resampling
-
Predictive resampling yields exact Bayesian posteriors
Variational predictive resampling
-
Synthesize likelihoods to meet accuracy bounds with minimal prior deviation
Sensor Design for Accuracy-Bounded Estimation via Maximum-Entropy Likelihood Synthesis
-
Neural tilting of Lévy measures enables jump-preserving SDE inference
Variational Inference for L\'evy Process-Driven SDEs via Neural Tilting
-
k-step policy gradients escape myopic traps in restricted MDPs
Revisiting Policy Gradients for Restricted Policy Classes: Escaping Myopic Local Optima with $k$-step Policy Gradients
-
Transformer states converge uniformly to ODEs at rate O(1/L + 1/(L^{1/3} sqrt(H)))
Uniform Scaling Limits in AdamW-Trained Transformers
-
Reasoning helps LLM judges only on hard tasks
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
-
Linear networks store facts up to p log p = d²/2
Factual recall in linear associative memories: sharp asymptotics and mechanistic insights
-
Finite VC dimension enables finite-sample tests for distribution trade-offs
When Are Trade-Off Functions Testable from Finite Samples?
-
Tail extrapolation approximates best-of-N gradients from m much smaller than N
What should post-training optimize? A test-time scaling law perspective
-
LASSO matches homogeneous threshold for mixed-quality sparse data
Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
-
Natural policy gradient equals smoothed policy iteration
Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework
-
LLM personas match human survey distributions on stable questions
When Can Digital Personas Reliably Approximate Human Survey Findings?
-
Divide-and-conquer causal discovery extends to latent variables
A Recursive Decomposition Framework for Causal Structure Learning in the Presence of Latent Variables
-
Amortized networks speed up causal sensitivity bounds by orders of magnitude
Amortizing Causal Sensitivity Analysis via Prior Data-Fitted Networks
-
Bayesian linear solvers are special cases of affine PIMs
Affine Tracing: A New Paradigm for Probabilistic Linear Solvers
-
Confidence weights fuse modalities for long-tailed recognition
Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data
-
Bound certifies any learned controller for unknown linear systems
A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems
-
Semi-simulated tests pick different winners than real data for treatment effects
Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation
-
Covariate-dependent level links low-fidelity quantiles to high-fidelity ones
Multi-Fidelity Quantile Regression
-
Sharp jumps in feature overlap set optimal neural scaling laws
Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks
-
Mass lift certifies regret in guided diffusion optimization
Regret Analysis of Guided Diffusion for Black-Box Optimization over Structured Inputs
-
Low-fidelity data yields kernels for high-fidelity PDE solving
Multifidelity Gaussian process regression for solving nonlinear partial differential equations
-
Unified taxonomy clarifies ML uncertainty for physics
Uncertainty in Physics and AI: Taxonomy, Quantification, and Validation
-
Expert losses cut MoE training time for time series
Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration
-
Test error in augmented random features depends only on data and augmentation moments
Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation