hub

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

Shashi Narayan, Shay B Cohen, Mirella Lapata · 2018 · cs.CL · arXiv 1808.08745

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it

open full Pith review browse 25 citing papers arXiv PDF

abstract

We introduce extreme summarization, a new single-document summarization task which does not favor extractive strategies and calls for an abstractive modeling approach. The idea is to create a short, one-sentence news summary answering the question "What is the article about?". We collect a real-world, large-scale dataset for this task by harvesting online articles from the British Broadcasting Corporation (BBC). We propose a novel abstractive model which is conditioned on the article's topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 2 background 1

citation-polarity summary

use dataset 2 background 1

representative citing papers

VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models

cs.CL · 2025-08-21 · unverdicted · novelty 7.0

VocabTailor introduces a decoupled dynamic vocabulary selection framework that reduces vocabulary-related memory in SLMs by up to 99% with minimal task performance loss.

Multitask Prompted Training Enables Zero-Shot Task Generalization

cs.LG · 2021-10-15 · conditional · novelty 7.0

Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.

Chain-based Distillation for Effective Initialization of Variable-Sized Small Language Models

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Chain-based Distillation constructs a sequence of anchor models to enable efficient initialization of variable-sized SLMs through interpolation, with bridge distillation for cross-architecture transfer, yielding better performance than scratch training.

Segmenting Human-LLM Co-authored Text via Change Point Detection

cs.CL · 2026-05-05 · unverdicted · novelty 7.0

Adapts change point detection to segment human-LLM co-authored text using weighted and generalized algorithms with minimax optimality and strong empirical results against baselines.

Analysis and Explainability of LLMs Via Evolutionary Methods

cs.NE · 2026-04-27 · unverdicted · novelty 7.0

Evolutionary trees from LLM weights recover ground-truth training topologies and identify key datasets and layers through phenotypic analysis.

Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

cs.CL · 2026-04-11 · unverdicted · novelty 7.0

Supervised fine-tuning of LLMs often fails to fully internalize all training instances due to five recurring causes including missing prerequisites and data conflicts, as diagnosed via a new framework across multiple models.

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

cs.LG · 2021-01-11 · accept · novelty 7.0

Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

cs.CL · 2019-10-29 · accept · novelty 7.0

BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.

LLM Self-Recognition: Steering and Retrieving Activation Signatures

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.

Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

Adapting multi-objective pure-exploration bandits enables efficient Pareto prompt set recovery and best feasible prompt identification for LLMs, with linear-case guarantees and empirical gains over baselines.

SnapAudit: Active Auditing of Differentially Private In-Context Learning via Snapshot-Based Simulation

cs.CR · 2025-11-17 · conditional · novelty 6.0

SnapAudit decomposes DP-ICL into a deterministic snapshot stage and a stochastic noise stage, using bootstrap simulation to achieve 80-200x faster auditing and exposing privacy bound violations in existing Gaussian and embedding mechanisms.

Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct

cs.LG · 2024-10-02 · unverdicted · novelty 6.0

Llama3-8b-Instruct recognizes its own outputs via a residual-stream vector associated with self-authorship that can be steered to control authorship claims and perceptions.

LLM Evaluators Recognize and Favor Their Own Generations

cs.CL · 2024-04-15 · unverdicted · novelty 6.0

LLMs show measurable self-recognition that linearly correlates with self-preference bias in evaluations, supported by fine-tuning experiments and controls for confounders.

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

cs.LG · 2023-06-24 · unverdicted · novelty 6.0

H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.

Can AI-Generated Text be Reliably Detected?

cs.CL · 2023-03-17 · unverdicted · novelty 6.0

Recursive paraphrasing attacks substantially lower detection rates for multiple AI text detectors with only minor quality loss, while a theoretical analysis ties best-case AUROC to total variation distance between human and AI distributions.

Are Large Language Models Economically Viable for Industry Deployment?

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.

Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification

cs.AI · 2026-04-18 · unverdicted · novelty 6.0

Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.

ST-MoE: Designing Stable and Transferable Sparse Expert Models

cs.CL · 2022-02-17 · unverdicted · novelty 6.0

ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.

CoX-MoE: Coalesced Expert Execution for High-Throughput MoE Inference with AMX-Enabled CPU-GPU Co-Execution

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

CoX-MoE achieves up to 7.1x higher throughput than FlexGen for MoE inference via coalesced expert execution and AMX-enabled CPU-GPU orchestration with static expert stratification.

On the Limits of Layer Pruning for Generative Reasoning in Large Language Models

cs.LG · 2026-02-02 · unverdicted · novelty 5.0

Layer pruning preserves classification performance in LLMs but fundamentally limits recovery of generative reasoning capabilities even after extensive self-supervised finetuning.

Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective

cs.AI · 2025-11-01 · conditional · novelty 5.0

The paper analyzes CPU bottlenecks in agentic AI serving, selects representative workloads, and demonstrates that CPU-aware scheduling optimizations COMB and MAS can reduce P50 latency by up to 1.7x and total latency by up to 2.49x on two hardware systems.

Enriching and Controlling Global Semantics for Text Summarization

cs.CL · 2021-09-22 · unverdicted · novelty 5.0

A normalizing-flow neural topic model plus control mechanism are added to Transformer summarizers to supply and regulate global semantics, with reported gains over prior models on five benchmarks.

Calibrating Model-Based Evaluation Metrics for Summarization

cs.CL · 2026-04-19 · unverdicted · novelty 5.0

A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.

Retrieval-Augmented Generation for Large Language Models: A Survey

cs.CL · 2023-12-18 · unverdicted · novelty 3.0

A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.

citing papers explorer

Showing 25 of 25 citing papers.

VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models cs.CL · 2025-08-21 · unverdicted · none · ref 12 · internal anchor
VocabTailor introduces a decoupled dynamic vocabulary selection framework that reduces vocabulary-related memory in SLMs by up to 99% with minimal task performance loss.
Multitask Prompted Training Enables Zero-Shot Task Generalization cs.LG · 2021-10-15 · conditional · none · ref 39 · internal anchor
Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
Chain-based Distillation for Effective Initialization of Variable-Sized Small Language Models cs.CL · 2026-05-08 · unverdicted · none · ref 91
Chain-based Distillation constructs a sequence of anchor models to enable efficient initialization of variable-sized SLMs through interpolation, with bridge distillation for cross-architecture transfer, yielding better performance than scratch training.
Segmenting Human-LLM Co-authored Text via Change Point Detection cs.CL · 2026-05-05 · unverdicted · none · ref 8
Adapts change point detection to segment human-LLM co-authored text using weighted and generalized algorithms with minimax optimality and strong empirical results against baselines.
Analysis and Explainability of LLMs Via Evolutionary Methods cs.NE · 2026-04-27 · unverdicted · none · ref 28
Evolutionary trees from LLM weights recover ground-truth training topologies and identify key datasets and layers through phenotypic analysis.
Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models cs.CL · 2026-04-11 · unverdicted · none · ref 5
Supervised fine-tuning of LLMs often fails to fully internalize all training instances due to five recurring causes including missing prerequisites and data conflicts, as diagnosed via a new framework across multiple models.
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity cs.LG · 2021-01-11 · accept · none · ref 22
Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension cs.CL · 2019-10-29 · accept · none · ref 15
BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.
LLM Self-Recognition: Steering and Retrieving Activation Signatures cs.AI · 2026-06-04 · unverdicted · none · ref 7 · internal anchor
Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits cs.LG · 2026-05-14 · unverdicted · none · ref 16 · internal anchor
Adapting multi-objective pure-exploration bandits enables efficient Pareto prompt set recovery and best feasible prompt identification for LLMs, with linear-case guarantees and empirical gains over baselines.
SnapAudit: Active Auditing of Differentially Private In-Context Learning via Snapshot-Based Simulation cs.CR · 2025-11-17 · conditional · none · ref 22 · internal anchor
SnapAudit decomposes DP-ICL into a deterministic snapshot stage and a stochastic noise stage, using bootstrap simulation to achieve 80-200x faster auditing and exposing privacy bound violations in existing Gaussian and embedding mechanisms.
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct cs.LG · 2024-10-02 · unverdicted · none · ref 11 · internal anchor
Llama3-8b-Instruct recognizes its own outputs via a residual-stream vector associated with self-authorship that can be steered to control authorship claims and perceptions.
LLM Evaluators Recognize and Favor Their Own Generations cs.CL · 2024-04-15 · unverdicted · none · ref 18 · internal anchor
LLMs show measurable self-recognition that linearly correlates with self-preference bias in evaluations, supported by fine-tuning experiments and controls for confounders.
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models cs.LG · 2023-06-24 · unverdicted · none · ref 48 · internal anchor
H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.
Can AI-Generated Text be Reliably Detected? cs.CL · 2023-03-17 · unverdicted · none · ref 36 · internal anchor
Recursive paraphrasing attacks substantially lower detection rates for multiple AI text detectors with only minor quality loss, while a theoretical analysis ties best-case AUROC to total variation distance between human and AI distributions.
Are Large Language Models Economically Viable for Industry Deployment? cs.CL · 2026-04-21 · unverdicted · none · ref 58
Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification cs.AI · 2026-04-18 · unverdicted · none · ref 33
Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.
ST-MoE: Designing Stable and Transferable Sparse Expert Models cs.CL · 2022-02-17 · unverdicted · none · ref 179
ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.
CoX-MoE: Coalesced Expert Execution for High-Throughput MoE Inference with AMX-Enabled CPU-GPU Co-Execution cs.LG · 2026-05-18 · unverdicted · none · ref 19 · internal anchor
CoX-MoE achieves up to 7.1x higher throughput than FlexGen for MoE inference via coalesced expert execution and AMX-enabled CPU-GPU orchestration with static expert stratification.
On the Limits of Layer Pruning for Generative Reasoning in Large Language Models cs.LG · 2026-02-02 · unverdicted · none · ref 21 · internal anchor
Layer pruning preserves classification performance in LLMs but fundamentally limits recovery of generative reasoning capabilities even after extensive self-supervised finetuning.
Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective cs.AI · 2025-11-01 · conditional · none · ref 24 · internal anchor
The paper analyzes CPU bottlenecks in agentic AI serving, selects representative workloads, and demonstrates that CPU-aware scheduling optimizations COMB and MAS can reduce P50 latency by up to 1.7x and total latency by up to 2.49x on two hardware systems.
Enriching and Controlling Global Semantics for Text Summarization cs.CL · 2021-09-22 · unverdicted · none · ref 26 · internal anchor
A normalizing-flow neural topic model plus control mechanism are added to Transformer summarizers to supply and regulate global semantics, with reported gains over prior models on five benchmarks.
Calibrating Model-Based Evaluation Metrics for Summarization cs.CL · 2026-04-19 · unverdicted · none · ref 14
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.
Retrieval-Augmented Generation for Large Language Models: A Survey cs.CL · 2023-12-18 · unverdicted · none · ref 153 · internal anchor
A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 167
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer