{"total":18,"items":[{"citing_arxiv_id":"2606.30814","ref_index":127,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"When Calibration Rankings Reverse: Accuracy-Controlled Evaluation for Fair Comparison of LLMs","primary_cat":"cs.CL","submitted_at":"2026-06-29T18:37:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Global calibration metrics like ECE are confounded by accuracy; the proposed ACE framework with three accuracy-controlled views shows many prior calibration advantages weaken or reverse.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.28654","ref_index":35,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"FedLAS: Feature-Modulated Bidirectional Label Smoothing for Neural Network Calibration","primary_cat":"cs.CV","submitted_at":"2026-06-26T23:55:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FedLAS adds feature-norm based confidence detection and bidirectional gating to label smoothing losses to reduce calibration error on vision benchmarks while preserving accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28869","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Balancing Multimodal Learning through Label Space Reshaping","primary_cat":"cs.LG","submitted_at":"2026-05-22T08:22:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"BMLR reshapes the cross-modal label space to equalize mapping difficulty and balance optimization across modalities in multimodal learning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21055","ref_index":26,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Genetic Programming with Transformer-Based Mutation for Approximate Circuit Design","primary_cat":"cs.NE","submitted_at":"2026-05-20T11:42:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A hybrid CGP scheme with a transformer mutation operator evolves approximate multipliers that achieve better error-power trade-offs than the EvoApproxLib library for several target constraints.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17575","ref_index":44,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"UniAlign: A Model-Agnostic Framework for Robust Network Traffic Classification under Distribution Shifts","primary_cat":"cs.LG","submitted_at":"2026-05-17T18:02:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"UniAlign improves robustness of deep learning NTC models under distribution shifts via domain alignment fine-tuning and stable ensembling, yielding 2.51% accuracy and 2.71% F1 gains over standard training on three public datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10364","ref_index":23,"ref_count":3,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DeepL\\'evy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series","primary_cat":"cs.LG","submitted_at":"2026-05-11T11:08:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DeepLévy learns mixtures of Lévy stable distributions for heavy-tailed time series forecasting by minimizing discrepancies between empirical and parametric characteristic functions, outperforming prior methods on tail risk metrics under extreme volatility.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"as a soundness argument rather than a new standalone theory contribution. We provide a formal proposition and a finite-batch moment lemma for empirical CF estimation in Appendix B.3; the lemma also explains why larger mini-batches reduce optimization noise. Entropy Regularization.To encourage the model to utilize multiple mixture components and prevent mode collapse [23], we add an entropy regularization term on the mixing weights and the λent ≥0controls the regularization strength: Ltotal =L CF −λ entH H= 1 BH BX b=1 HX h=1 − KX k=1 π(h,b) k ln(π(h,b) k +ϵ H) ! (14) Gradient Clipping.Heavy-tailed distributions can produce large gradient magnitudes. To ensure stable training, we apply Gradient Clipping [22] with the parameter update rule:"},{"citing_arxiv_id":"2605.09995","ref_index":37,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Annotations Mitigate Post-Training Mode Collapse","primary_cat":"cs.CL","submitted_at":"2026-05-11T05:11:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08967","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Condensation Transition in Entropy-Constrained Probability Spaces","primary_cat":"cond-mat.stat-mech","submitted_at":"2026-05-09T14:22:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Below a critical entropy H_c ≈ log K - 1 + γ in the large-K limit, the typical fixed-entropy distribution on the probability simplex condenses so that one component holds a macroscopic probability fraction while the rest form a uniform background.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"sity of predictions necessary for robust generalization. Our findings are also in line with observations in com- munity ecology, where a few abundant species typically dominate over a long tail of rare ones [25]. A sharp de- cline in diversity often results in the emergence of a single dominant species, a phenomenon usually attributed to competitive exclusion [26]. Our results suggest that the emergence of dominance may partly reflect a geometric effect intrinsic to constrained probability spaces, inde- pendent of the specific ecological mechanisms involved. Just as species compete for finite niches, states in a stochastic model compete for a fixed number of prob- ability quanta. When the system's entropy is restricted,"},{"citing_arxiv_id":"2604.04488","ref_index":79,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models","primary_cat":"cs.CV","submitted_at":"2026-04-06T07:27:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to preserve benign generation quality.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"tribution difference under different views may induce the model to produce over- confident extreme predictions at some locations, which in turn leads to out- put distribution collapse or even degra- dation of normal generation capability. To avoid this side effect, we further introduce uncertainty-aware regulariza- tion constraints at the output layer [79] to limit the degree of over-concentration of the model's predictions from the perspec- tive of distributional entropy, in order to stabilize the training process and main- tain the diversity of generation on normal samples. For the output distributionp b,t of theb-th sample in the batch at thet-th token position, the information entropy is defined as:"},{"citing_arxiv_id":"2604.03993","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Can LLMs Learn to Reason Robustly under Noisy Supervision?","primary_cat":"cs.LG","submitted_at":"2026-04-05T06:30:50+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Online Label Refinement lets LLMs learn robust reasoning from noisy supervision by correcting labels when majority answers show rising rollout success and stable history, delivering 3-4% gains on math and reasoning benchmarks even at high noise levels.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Step 4: High-probability bound. Applying Azuma-Hoeffding to ∑t s=0 Ms(xn), since each |Ms(xn)| ≤ ηC p log(1/δ)/K, we obtain Pr \u0010 t ∑ s=0 Ms(xn) ≥ ϵ \u0011 ≤ 2 exp \u0010 − ϵ2 2tη2C2 log(1/δ)/K \u0011 . (19) Setting ϵ = tηC p log(1/δ)/K gives probability at most δ. A.4 Cross-Sample Coupling Define cross-sample coupling: Γ(xi, xj) = ∇θ log π(y⋆(xj)|xj) · ∇θ log π(y⋆(xi)|xi). (20) Assume: Exc∼Dclean,xn∼Dnoise[Γ(xc, xn)] ≥ γ > 0. Then the deterministic drift for noise sample correct log-probability is: ∆t = γ(1 − ρ)Gc − ρGn, where Gc and Gn denote average advantage magnitudes over clean and noisy samples. 18 A.5 High-Probability Early Correctness Coherence Theorem A.4 (High-Probability Early Correctness Coherence) . Suppose p0(y⋆) > p0( ˜y), γ > 0, and"},{"citing_arxiv_id":"2602.12687","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty","primary_cat":"cs.LG","submitted_at":"2026-02-13T07:43:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CUD reshapes the teacher's predictive distribution before distillation so that students receive calibrated uncertainty signals alongside accuracy, yielding more robust and better-calibrated models on high-cardinality and distribution-shift benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.17412","ref_index":29,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Ridge Too Far: Correcting Over-Shrinkage via Negative Regularization","primary_cat":"cs.LG","submitted_at":"2025-08-24T15:34:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Negative-capable ridge regression uses controlled negative regularization as anti-shrinkage to increase effective complexity along weak eigendirections and mitigate underfitting in small-data regression.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.07285","ref_index":112,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Non-Intrusive Automatic Speech Recognition Refinement: A Survey","primary_cat":"eess.AS","submitted_at":"2025-08-10T10:46:14+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey that classifies non-intrusive ASR refinement methods into five categories, reviews domain adaptation and evaluation datasets, proposes standardized metrics, and identifies future research directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[65], [12], [66], [67], [68], [17], [69], [70] [71], [72], [11], [73], [74], [47], [15], [16], [75], [76], [77], [78], [79], [80], [81], [82], [14], [83], [84], [85] [86], [87], [88], [89], [90], [91], [92], [93] [94], [95], [96], [97], [98], [99] [100], [101], [20], [102], [21], [103], [104], [105], [106], [107], [108] MWE LS ILMT [22] [109], [110] [111], [112], [29] Figure 3: Overview of methods for non-intrusive ASR refinement, grouped into Fusion, Rescoring, Correction, Distillation, and Training Adjustment. Each branch shows subcategories with representative studies. Compared to Density Ratio-which subtracts a separately trained source-domain LM under a hybrid-style factorization- Meng et al. in [31] proposed internal LM estimation (ILME) that employs Joint Softmax Approximation to estimate"},{"citing_arxiv_id":"1910.13461","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension","primary_cat":"cs.CL","submitted_at":"2019-10-29T18:01:00+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.11202","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Unsupervised Domain Adaptation via Calibrating Uncertainties","primary_cat":"cs.LG","submitted_at":"2019-07-25T17:02:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A new regularization approach for unsupervised domain adaptation that calibrates Renyi entropy of uncertainties estimated via variational Bayes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.06757","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AugLabel: Exploiting Word Representations to Augment Labels for Face Attribute Classification","primary_cat":"cs.CV","submitted_at":"2019-07-15T21:15:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Augmenting face attribute labels with word2vec embeddings improves deep classifier performance on CelebA and LFWA and reaches comparable accuracy with 50% less labeled data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.06017","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition","primary_cat":"eess.AS","submitted_at":"2019-07-13T06:27:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Knowledge distillation from an external RNN language model to a seq2seq ASR model yields 9.3% CER on Chinese datasets, an 18.42% relative improvement over the baseline without test-time fusion components.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.03187","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction","primary_cat":"cs.CL","submitted_at":"2019-07-06T21:05:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A Spanish Twitter language model trained from scratch with label smoothing placed 3rd and 2nd in the HAHA 2019 humor classification and regression tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}