Abstractive Summarization of Large Document Collections Using GPT

Christopher G. Healey; Sengjie Liu

arxiv: 2310.05690 · v1 · pith:X7VKFJHJnew · submitted 2023-10-09 · 💻 cs.AI

Abstractive Summarization of Large Document Collections Using GPT

Sengjie Liu , Christopher G. Healey This is my paper

classification 💻 cs.AI

keywords documentsummarizationbartabstractivecollectionsdatasetdocumentsindividual

0 comments

read the original abstract

This paper proposes a method of abstractive summarization designed to scale to document collections instead of individual documents. Our approach applies a combination of semantic clustering, document size reduction within topic clusters, semantic chunking of a cluster's documents, GPT-based summarization and concatenation, and a combined sentiment and text visualization of each topic to support exploratory data analysis. Statistical comparison of our results to existing state-of-the-art systems BART, BRIO, PEGASUS, and MoCa using ROGUE summary scores showed statistically equivalent performance with BART and PEGASUS on the CNN/Daily Mail test dataset, and with BART on the Gigaword test dataset. This finding is promising since we view document collection summarization as more challenging than individual document summarization. We conclude with a discussion of how issues of scale are

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Althea: Human-AI Collaboration for Fact-Checking and Critical Reasoning
cs.HC 2025-12 unverdicted novelty 5.0

Althea integrates retrieval-augmented reasoning with varying levels of user scaffolding to improve fact-checking accuracy and foster persistent improvements in critical thinking.