Recognition: unknown
When Less is More: The LLM Scaling Paradox in Context Compression
read the original abstract
Scaling up model parameters has long been a prevalent training paradigm driven by the assumption that larger models yield superior generation capabilities. However, under lossy context compression in a compressor--decoder setup, we find a \textbf{\textit{Size-Fidelity Paradox}}: increasing compressor size can lessen the faithfulness of reconstructed contexts though reconstruction error decreases. Across 27 compressor setups spanning model families, scales, and compression rates, we coin this paradox arising from two dominant factors: 1) \textit{knowledge overwriting}: larger models increasingly replace source facts with their own prior beliefs, \textit{e.g.}, ``the white strawberry`` $\to$ ``the red strawberry``; and 2) \textit{semantic drift}: larger models tend to paraphrase or restructure content instead of reproducing it verbatim, \textit{e.g.}, ``Alice hit Bob`` $\to$ ``Bob hit Alice``. Interestingly, this paradox persists across varied settings, with mid-sized compressors often outperforming larger ones in faithful recovery. By analyzing the compressed memory via embedding geometry and reconstruction determinacy, we further reveal that compressors tend to organize memory across broader semantic subspaces, yielding more ambiguous representations prone to overwriting, drift, and weakened recovery. These findings complement existing evaluations of context compression and expose a breakdown of scaling laws when the objective shifts from plausible generation to faithful preservation.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing
SAGE is a training-free context reduction method that converts attention signals from a small LLM into a differential relevance heatmap to select top units for downstream QA, achieving competitive accuracy at 10% toke...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.