{"total":23,"items":[{"citing_arxiv_id":"2605.13434","ref_index":22,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity","primary_cat":"cs.LG","submitted_at":"2026-05-13T12:27:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11170","ref_index":212,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data","primary_cat":"cs.LG","submitted_at":"2026-05-11T19:28:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11091","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ASD-Bench: A Four-Axis Comprehensive Benchmark of AI Models for Autism Spectrum Disorder","primary_cat":"cs.LG","submitted_at":"2026-05-11T18:01:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ASD-Bench evaluates 17 ML and deep learning models on 4,068 AQ-10 records across child, adolescent, and adult cohorts, showing high adult performance, harder adolescent classification, shifting feature importance, and dissociation between accuracy and calibration.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10282","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Misspecified Universal Learning","primary_cat":"cs.IT","submitted_at":"2026-05-11T09:44:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Minimax regret is characterized for misspecified universal learning with log-loss, yielding the optimal universal learner as a unified framework for any uncertainty in the data-generating process.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10272","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DP-LAC: Lightweight Adaptive Clipping for Differentially Private Federated Fine-tuning of Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-11T09:32:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DP-LAC provides a new adaptive clipping technique for DP-SGD in federated LLM fine-tuning that improves accuracy by 6.6% on average without consuming additional privacy budget or requiring new hyperparameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09991","ref_index":146,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Optimizer-Induced Mode Connectivity: From AdamW to Muon","primary_cat":"cs.AI","submitted_at":"2026-05-11T05:07:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Optimizer choice induces distinct connected regions in the loss landscape of two-layer ReLU networks, with AdamW and Muon sometimes separated by provable barriers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09106","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain","primary_cat":"cs.CL","submitted_at":"2026-05-09T18:28:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08529","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The Propagation Field: A Geometric Substrate Theory of Deep Learning","primary_cat":"cs.LG","submitted_at":"2026-05-08T22:26:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Neural networks possess a propagation field of trajectories and Jacobians whose quality can be measured and optimized independently of endpoint loss, yielding better unseen-path generalization and reduced forgetting in continual learning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07844","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Distributional simplicity bias and effective convexity in Energy Based Models","primary_cat":"cs.LG","submitted_at":"2026-05-08T15:08:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Gradient flow in energy-based models for strictly positive binary distributions produces stable data-consistent fixed points and a learning hierarchy that favors lower-order interactions first, mechanistically explaining distributional simplicity bias.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06654","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less","primary_cat":"cs.LG","submitted_at":"2026-05-07T17:57:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Full finetuning with the pretraining optimizer reduces forgetting compared to other optimizers or LoRA while achieving comparable new-task performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06314","ref_index":35,"ref_count":2,"confidence":0.9,"is_internal_anchor":true,"paper_title":"When Does $\\ell_2$-Boosting Overfit Benignly? High-Dimensional Risk Asymptotics and the $\\ell_1$ Implicit Bias","primary_cat":"cs.LG","submitted_at":"2026-05-07T14:14:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"ℓ₂-Boosting exhibits benign overfitting with logarithmic excess variance decay Θ(σ²/log(p/n)) under isotropic noise due to ℓ₁ bias, and a subdifferential early stopping rule recovers minimax-optimal ℓ₁ rates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02853","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Trust, but Verify: Peeling Low-Bit Transformer Networks for Training Monitoring","primary_cat":"cs.LG","submitted_at":"2026-05-04T17:30:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A layer-wise peeling framework creates reference bounds to diagnose under-optimized layers in trained decoder-only transformers, including low-bit and quantized versions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25965","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Adversarial Robustness of NTK Neural Networks","primary_cat":"stat.ML","submitted_at":"2026-04-28T04:49:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"NTK networks achieve minimax optimal adversarial regression rates in Sobolev spaces with early stopping, but minimum-norm interpolants are vulnerable.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24952","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization","primary_cat":"cs.CV","submitted_at":"2026-04-27T19:49:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Semi-DPO applies semi-supervised learning to noisy preference data in diffusion DPO by training first on consensus pairs then iteratively pseudo-labeling conflicts, yielding state-of-the-art alignment with complex human preferences.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20365","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Benefits of Low-Cost Bio-Inspiration in the Age of Overparametrization","primary_cat":"cs.RO","submitted_at":"2026-04-22T09:02:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Shallow MLPs and dense CPGs outperform deeper MLPs and Actor-Critic RL in bounded robot control tasks with limited proprioception, with a Parameter Impact metric indicating extra RL parameters yield no performance gain over evolutionary strategies.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19643","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A Gesture-Based Visual Learning Model for Acoustophoretic Interactions using a Swarm of AcoustoBots","primary_cat":"cs.RO","submitted_at":"2026-04-21T16:32:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"OpenCLIP-based gesture classification with linear probing controls AcoustoBot swarms at 87.8% accuracy and 3.95 s latency in controlled tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14017","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Stochastic Trust-Region Methods for Over-parameterized Models","primary_cat":"math.OC","submitted_at":"2026-04-15T15:57:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Stochastic trust-region methods achieve O(ε^{-2} log(1/ε)) complexity for unconstrained problems and O(ε^{-4} log(1/ε)) for equality-constrained problems under the strong growth condition, with experiments showing stable performance comparable to tuned baselines without learning-rate scheduling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10860","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Stochastic Modified Equations for Stochastic Gradient Descent in Infinite-Dimensional Hilbert Spaces","primary_cat":"math.OC","submitted_at":"2026-04-12T23:55:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SGD dynamics in Hilbert spaces are approximated by an SDE with cylindrical noise, with the weak error between discrete and continuous versions shown to be second order in the step size.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09258","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima","primary_cat":"cs.LG","submitted_at":"2026-04-10T12:17:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Nexus optimizer improves LLM downstream performance by converging to common minima across data sources despite identical pretraining loss.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05583","ref_index":77,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"WRF4CIR: Weight-Regularized Fine-Tuning Network for Composed Image Retrieval","primary_cat":"cs.CV","submitted_at":"2026-04-07T08:23:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"WRF4CIR uses weight-regularized fine-tuning with adversarial perturbations to mitigate overfitting in composed image retrieval and narrows the generalization gap on benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2201.02177","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets","primary_cat":"cs.LG","submitted_at":"2022-01-06T18:43:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1712.00409","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Deep Learning Scaling is Predictable, Empirically","primary_cat":"cs.LG","submitted_at":"2017-12-01T17:13:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1610.01644","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Understanding intermediate layers using linear classifier probes","primary_cat":"stat.ML","submitted_at":"2016-10-05T20:59:01+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Linear probes demonstrate that feature separability for classification increases monotonically with network depth in Inception v3 and ResNet-50.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}