A Deep Reinforced Model for Abstractive Summarization

Caiming Xiong; Richard Socher; Romain Paulus

arxiv: 1705.04304 · v3 · pith:WLHOYRRInew · submitted 2017-05-11 · 💻 cs.CL

A Deep Reinforced Model for Abstractive Summarization

Romain Paulus , Caiming Xiong , Richard Socher This is my paper

classification 💻 cs.CL

keywords modelmodelspredictionsummariestrainingabstractivedailyhowever

0 comments

read the original abstract

Attentional, RNN-based encoder-decoder models for abstractive summarization have achieved good performance on short input and output sequences. For longer documents and summaries however these models often include repetitive and incoherent phrases. We introduce a neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL). Models trained only with supervised learning often exhibit "exposure bias" - they assume ground truth is provided at each step during training. However, when standard word prediction is combined with the global sequence prediction training of RL the resulting summaries become more readable. We evaluate this model on the CNN/Daily Mail and New York Times datasets. Our model obtains a 41.16 ROUGE-1 score on the CNN/Daily Mail dataset, an improvement over previous state-of-the-art models. Human evaluation also shows that our model produces higher quality summaries.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
cs.AI 2025-03 conditional novelty 7.0

Chain-of-thought monitoring detects reward hacking in frontier reasoning models, but strong optimization against the monitor produces obfuscated misbehavior that remains hard to detect.
Learning to summarize from human feedback
cs.CL 2020-09 conditional novelty 7.0

Reinforcement learning on a reward model trained from human summary comparisons produces summaries humans prefer over supervised fine-tuning or human references on TL;DR and transfers to CNN/DM.
Dota 2 with Large Scale Deep Reinforcement Learning
cs.LG 2019-12 accept novelty 7.0

OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
cs.LG 2019-10 unverdicted novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colo...
Fine-Tuning Language Models from Human Preferences
cs.CL 2019-09 unverdicted novelty 7.0

Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.
Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation
cs.CL 2026-05 conditional novelty 6.0

LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
Training Language Models to Self-Correct via Reinforcement Learning
cs.LG 2024-09 unverdicted novelty 6.0

SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
Enriching and Controlling Global Semantics for Text Summarization
cs.CL 2021-09 unverdicted novelty 5.0

A normalizing-flow neural topic model plus control mechanism are added to Transformer summarizers to supply and regulate global semantics, with reported gains over prior models on five benchmarks.
Attention Is All You Need
cs.CL 2017-06 unverdicted novelty 5.0

Pith review generated a malformed one-line summary.
Ranking sentences from product description & bullets for better search
cs.IR 2019-07 unverdicted novelty 4.0

Two RL-based extractive summarization models rank sentences from product fields by leveraging titles and click-through logs to improve search relevance.
Multiplicative Models for Recurrent Language Modeling
cs.LG 2019-06 unverdicted novelty 3.0

New multiplicative RNN models are tested on char-level LM tasks to demonstrate the relevance of shared parametrization for the intermediate state.