pith. machine review for the scientific record. sign in

arxiv: 1808.08745 · v1 · submitted 2018-08-27 · 💻 cs.CL

Recognition: unknown

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

Mirella Lapata, Shashi Narayan, Shay B. Cohen

classification 💻 cs.CL
keywords abstractivesummarizationarticleconvolutionalextractiveextremenetworksneural
0
0 comments X
read the original abstract

We introduce extreme summarization, a new single-document summarization task which does not favor extractive strategies and calls for an abstractive modeling approach. The idea is to create a short, one-sentence news summary answering the question "What is the article about?". We collect a real-world, large-scale dataset for this task by harvesting online articles from the British Broadcasting Corporation (BBC). We propose a novel abstractive model which is conditioned on the article's topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 12 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Chain-based Distillation for Effective Initialization of Variable-Sized Small Language Models

    cs.CL 2026-05 unverdicted novelty 7.0

    Chain-based Distillation constructs a sequence of anchor models to enable efficient initialization of variable-sized SLMs through interpolation, with bridge distillation for cross-architecture transfer, yielding bette...

  2. Segmenting Human-LLM Co-authored Text via Change Point Detection

    cs.CL 2026-05 unverdicted novelty 7.0

    Adapts change point detection to segment human-LLM co-authored text using weighted and generalized algorithms with minimax optimality and strong empirical results against baselines.

  3. Analysis and Explainability of LLMs Via Evolutionary Methods

    cs.NE 2026-04 unverdicted novelty 7.0

    Evolutionary trees from LLM weights recover ground-truth training topologies and identify key datasets and layers through phenotypic analysis.

  4. Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

    cs.CL 2026-04 unverdicted novelty 7.0

    Supervised fine-tuning of LLMs often fails to fully internalize all training instances due to five recurring causes including missing prerequisites and data conflicts, as diagnosed via a new framework across multiple models.

  5. Multitask Prompted Training Enables Zero-Shot Task Generalization

    cs.LG 2021-10 conditional novelty 7.0

    Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.

  6. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

    cs.LG 2021-01 accept novelty 7.0

    Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.

  7. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    cs.CL 2019-10 accept novelty 7.0

    BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.

  8. Are Large Language Models Economically Viable for Industry Deployment?

    cs.CL 2026-04 unverdicted novelty 6.0

    Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.

  9. Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification

    cs.AI 2026-04 unverdicted novelty 6.0

    Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.

  10. ST-MoE: Designing Stable and Transferable Sparse Expert Models

    cs.CL 2022-02 unverdicted novelty 6.0

    ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost ...

  11. Calibrating Model-Based Evaluation Metrics for Summarization

    cs.CL 2026-04 unverdicted novelty 5.0

    A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.

  12. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

    cs.CL 2024-12 accept novelty 3.0

    A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.