Regularizing neural networks by penalizing confident output distributions

Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, Geoffrey Hinton · 2017 · arXiv 1701.06548

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

cs.CL · 2019-10-29 · accept · novelty 7.0

BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.

Annotations Mitigate Post-Training Mode Collapse

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.

Can LLMs Learn to Reason Robustly under Noisy Supervision?

cs.LG · 2026-04-05 · conditional · novelty 6.0

Online Label Refinement lets LLMs learn robust reasoning from noisy supervision by correcting labels when majority answers show rising rollout success and stable history, delivering 3-4% gains on math and reasoning benchmarks even at high noise levels.

Condensation Transition in Entropy-Constrained Probability Spaces

cond-mat.stat-mech · 2026-05-09 · unverdicted · novelty 5.0

Below a critical entropy H_c ≈ log K - 1 + γ in the large-K limit, the typical fixed-entropy distribution on the probability simplex condenses so that one component holds a macroscopic probability fraction while the rest form a uniform background.

A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models

cs.CV · 2026-04-06 · unverdicted · novelty 5.0

A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to preserve benign generation quality.

DeepL\'evy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series

cs.LG · 2026-05-11 · 2 refs

citing papers explorer

Showing 6 of 6 citing papers.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension cs.CL · 2019-10-29 · accept · none · ref 16
BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.
Annotations Mitigate Post-Training Mode Collapse cs.CL · 2026-05-11 · unverdicted · none · ref 37
Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.
Can LLMs Learn to Reason Robustly under Noisy Supervision? cs.LG · 2026-04-05 · conditional · none · ref 20
Online Label Refinement lets LLMs learn robust reasoning from noisy supervision by correcting labels when majority answers show rising rollout success and stable history, delivering 3-4% gains on math and reasoning benchmarks even at high noise levels.
Condensation Transition in Entropy-Constrained Probability Spaces cond-mat.stat-mech · 2026-05-09 · unverdicted · none · ref 26
Below a critical entropy H_c ≈ log K - 1 + γ in the large-K limit, the typical fixed-entropy distribution on the probability simplex condenses so that one component holds a macroscopic probability fraction while the rest form a uniform background.
A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models cs.CV · 2026-04-06 · unverdicted · none · ref 79
A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to preserve benign generation quality.
DeepL\'evy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series cs.LG · 2026-05-11 · unreviewed · ref 23 · 2 links

Regularizing neural networks by penalizing confident output distributions

fields

years

verdicts

representative citing papers

citing papers explorer