arxiv: 2605.09463 · v1 · submitted 2026-05-10 · 💻 cs.CL

Recognition: no theorem link

Beyond Position Bias: Shifting Context Compression from Position-Driven to Semantic-Driven

Chen Jason Zhang, Jianxing Yu, Jian Yin, Jiwei Tang, Libin Zheng, Rui Meng, Xinyu Zhang, Zhijing Huang

Pith reviewed 2026-05-12 03:21 UTC · model grok-4.3

classification 💻 cs.CL

keywords context compressionposition biassemantic consistencylarge language modelslong-context scenariossoft promptsinference efficiencyout-of-domain robustness

0 comments

The pith

SeCo compresses LLM contexts by anchoring in semantic space rather than token positions to eliminate bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current methods for compressing long contexts in large language models rely on fixed positions or token layouts, which introduce bias and fragment semantics. The paper proposes shifting to a semantic-driven approach. SeCo identifies query-relevant tokens as centers and merges others using consistency weights. This is meant to maintain meaning better while cutting computation. Tests on 14 benchmarks with two models show gains in accuracy, speed, and generalization outside training domains.

Core claim

The central claim is that Semantic Consistency Context Compression (SeCo) overcomes position bias in soft prompt compression by dynamically selecting query-relevant tokens as semantic centers and aggregating remaining tokens via consistency-weighted merging, which preserves semantic consistency and improves performance on downstream tasks.

What carries the argument

SeCo (Semantic Consistency Context Compression), which anchors compression in the semantic space using query-relevant token selection as centers and consistency-weighted merging for aggregation.

If this is right

Models can handle longer contexts with less latency and without position-induced performance drops.
Downstream task accuracy improves consistently across multiple benchmarks.
Out-of-domain robustness increases due to reduced semantic fragmentation.
Compression becomes more stable without needing position-specific tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar semantic anchoring could extend to compressing other data types like images or code sequences.
Integrating SeCo with retrieval methods might further enhance long-context reasoning.
Testing on even longer sequences would verify if the semantic consistency holds at scale.

Load-bearing premise

That identifying query-relevant tokens as semantic centers and performing consistency-weighted merging will reliably preserve critical information without introducing new semantic distortions.

What would settle it

Running SeCo on a benchmark with queries where relevance is deliberately hard to determine, and checking if accuracy falls below position-based methods or shows persistent inconsistencies.

Figures

Figures reproduced from arXiv: 2605.09463 by Chen Jason Zhang, Jianxing Yu, Jian Yin, Jiwei Tang, Libin Zheng, Rui Meng, Xinyu Zhang, Zhijing Huang.

**Figure 2.** Figure 2: The Training Paradigm of SeCo. SeCo is based on an encoder-decoder architecture. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The Compression Process of SeCo. Specifically, it sequentially performs four steps: [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Additional results on long context and summarization benchmarks compared with task [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Without modifying the attention mask, the bias factor [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Exact Match (EM) across both in-domain and out-of-domain QA datasets at different [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

read the original abstract

Large Language Models (LLMs) have demonstrated exceptional performance across diverse tasks. However, their deployment in long-context scenarios faces high computational overhead and information redundancy. While soft prompt compression has emerged as a promising way to mitigate these costs by compressing sequences into compact embeddings, existing paradigms remain fundamentally constrained by position bias: they primarily rely on learnable tokens insertion at fixed positions or group tokens according to their physical token layout, thereby inducing performance instability and semantic fragmentation. To overcome this bottleneck, we propose Semantic Consistency Context Compression (SeCo), a method that shifts context compression from position-driven to semantic-driven. Rather than constraint by physical token layout, SeCo dynamically anchors compression directly in the semantic space by selecting query-relevant tokens as semantic centers and aggregating remaining tokens via consistency-weighted merging. This design inherently preserves semantic consistency while eliminating position bias. Extensive experiments on 14 benchmarks across two backbone models demonstrate that SeCo consistently shows superiority in downstream tasks, inference latency, and out-of-domain robustness. The code is available at https://anonymous.4open.science/r/seco-EE5E.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SeCo proposes picking query-relevant tokens as semantic centers then merging the rest with consistency weights to dodge position bias in LLM context compression, but the reported gains rest on high-level summaries without isolating the mechanism.

read the letter

The core idea is straightforward: instead of inserting learnable tokens at fixed spots or grouping by token order, SeCo selects query-relevant tokens as anchors in semantic space and aggregates the others through consistency-weighted merging. This is framed as a direct move away from position-driven methods that can cause instability and fragmentation. The experiments run across 14 benchmarks on two backbone models and claim better downstream results, lower inference latency, and stronger out-of-domain behavior. Releasing the code helps anyone who wants to inspect the actual implementation steps. That part is useful and worth noting for readers who need practical compression options right now. The main weakness is that the abstract gives almost no concrete details on how relevance is scored, what the consistency weights look like, or which exact baselines were used. There are no error bars, significance tests, or ablations that turn positional signals on and off. If the relevance scoring or merging still draws from embeddings that carry positional encodings, the bias may simply shift rather than disappear. The stress-test concern lands here: without those controls, the superiority could trace to compression ratio or other factors instead of the semantic shift itself. The two free parameters mentioned also raise the usual question of how much tuning was needed to hit the reported numbers. This work is aimed at groups focused on long-context efficiency and prompt compression. Someone already working on those problems will find the semantic-center framing worth reading, even if they end up implementing their own version after checking the code. It is solid enough on the problem statement and the distinct design choice to deserve a serious referee. Reviewers can ask for the missing implementation specifics and the positional-isolation experiments without the paper being dismissed outright.

Referee Report

3 major / 1 minor

Summary. The paper proposes Semantic Consistency Context Compression (SeCo) to address position bias in soft prompt compression for long-context LLMs. Instead of relying on fixed-position learnable tokens or physical token layout, SeCo selects query-relevant tokens as semantic centers and aggregates remaining tokens via consistency-weighted merging. The authors claim this semantic-driven approach inherently eliminates position bias while preserving semantic consistency, with experiments on 14 benchmarks across two backbone models showing gains in downstream performance, inference latency, and out-of-domain robustness. Code is released.

Significance. If the core claim holds—that semantic center selection and consistency-weighted merging operate independently of positional encodings and without introducing new distortions—this would address a genuine limitation in existing compression methods and improve robustness for long-context LLM applications. The public code release is a strength for reproducibility. However, the current presentation provides only high-level experimental summaries without sufficient controls or implementation details to confirm the position-bias elimination.

major comments (3)

[Abstract] Abstract: The central claim that SeCo 'inherently preserves semantic consistency while eliminating position bias' rests on the unverified assumption that query-relevant token selection and consistency-weighted merging use no positional signals. If relevance scoring or consistency measurement is performed via the backbone model's attention or embeddings (which embed positional information), or if merging weights correlate with original token layout, position bias is relocated rather than removed. No implementation details or ablation isolating semantic vs. residual positional effects are provided to support this.
[Experimental results] Experimental results (as summarized): Claims of 'consistent superiority' on 14 benchmarks lack any details on baselines, statistical significance testing, error bars, exact merging procedure, or controls for compression ratio. Without these, it is impossible to determine whether reported gains stem from bias removal or from other factors such as different compression ratios or tuning.
[Method description] Method description: The approach introduces free parameters ('number of semantic centers', 'consistency threshold for merging') and invented entities ('semantic center', 'consistency-weighted merging'). No ablation studies demonstrate that these choices are robust across tasks or that they avoid reintroducing position-like biases through task-specific tuning.

minor comments (1)

[Abstract] Abstract: 'constraint by physical token layout' should read 'constrained by physical token layout'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights areas where additional clarity and rigor will strengthen the manuscript. We address each major comment point by point below, outlining specific revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that SeCo 'inherently preserves semantic consistency while eliminating position bias' rests on the unverified assumption that query-relevant token selection and consistency-weighted merging use no positional signals. If relevance scoring or consistency measurement is performed via the backbone model's attention or embeddings (which embed positional information), or if merging weights correlate with original token layout, position bias is relocated rather than removed. No implementation details or ablation isolating semantic vs. residual positional effects are provided to support this.

Authors: We agree that the manuscript would benefit from explicit clarification on this point. In SeCo, semantic centers are selected via cosine similarity between query embeddings and context token embeddings (computed independently of positional encodings), and consistency-weighted merging uses pairwise semantic similarity scores for aggregation weights. This design operates in semantic space without reference to token positions or layout. However, we acknowledge the absence of implementation pseudocode and targeted ablations isolating residual positional effects. In the revision, we will add a new subsection detailing the exact computation (with equations) and an ablation comparing SeCo variants with/without positional information in the backbone embeddings, confirming that performance gains persist when positional signals are masked. revision: yes
Referee: [Experimental results] Experimental results (as summarized): Claims of 'consistent superiority' on 14 benchmarks lack any details on baselines, statistical significance testing, error bars, exact merging procedure, or controls for compression ratio. Without these, it is impossible to determine whether reported gains stem from bias removal or from other factors such as different compression ratios or tuning.

Authors: We accept this critique regarding insufficient experimental transparency. The full paper compares against multiple position-based baselines (e.g., fixed-position soft prompts and layout-grouped merging) at matched compression ratios, but the summary presentation omitted key details. In the revised version, we will expand the experimental section to include: full baseline descriptions and hyperparameters, paired statistical significance tests with p-values, error bars from 5 random seeds, pseudocode for the merging procedure, and explicit controls verifying identical compression ratios across methods. These additions will demonstrate that superiority arises from the semantic-driven mechanism rather than confounding factors. revision: yes
Referee: [Method description] Method description: The approach introduces free parameters ('number of semantic centers', 'consistency threshold for merging') and invented entities ('semantic center', 'consistency-weighted merging'). No ablation studies demonstrate that these choices are robust across tasks or that they avoid reintroducing position-like biases through task-specific tuning.

Authors: We will clarify terminology in the revision: 'semantic center' refers to query-relevant anchor tokens selected by embedding similarity, and 'consistency-weighted merging' denotes aggregation weighted by semantic coherence scores. The parameters are standard hyperparameters tuned on a held-out validation set. To address the lack of robustness evidence, we will add ablation studies varying the number of centers and threshold across all 14 benchmarks, reporting performance stability and confirming that fixed (non-task-specific) values suffice without introducing layout-dependent biases. This will show the choices are not fragile or position-reintroducing. revision: yes

Circularity Check

0 steps flagged

No circularity: method proposal and empirical claims are independent of inputs

full rationale

The paper introduces SeCo as a new semantic-driven compression technique via query-relevant token selection and consistency-weighted merging, asserting it eliminates position bias by design. No equations, derivations, or steps in the abstract or description reduce the claimed superiority or bias elimination to a fitted parameter, self-definition, or self-citation chain. The central claims rest on the proposed algorithm and external benchmark results across 14 tasks, which serve as independent validation rather than tautological restatement. No load-bearing uniqueness theorems, ansatzes, or renamings of known results are invoked in a self-referential manner.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 2 invented entities

Abstract provides no explicit equations or hyperparameter lists, so the ledger is populated at the level of conceptual commitments only.

free parameters (2)

number of semantic centers
Must be chosen or derived per query; not specified how it is set or whether it is fixed across tasks.
consistency threshold for merging
Weighting and aggregation step implies at least one tunable similarity cutoff or weighting scheme.

axioms (1)

domain assumption Semantic similarity between tokens can be measured reliably in the model's embedding space without reference to position.
Core premise of shifting from position to semantic space; invoked when selecting centers and computing consistency weights.

invented entities (2)

semantic center no independent evidence
purpose: Anchor point for compression that replaces fixed-position tokens.
New conceptual unit introduced to drive the selection and merging process.
consistency-weighted merging no independent evidence
purpose: Aggregation operator that replaces position-based grouping.
New operation whose exact formulation is not detailed in the abstract.

pith-pipeline@v0.9.0 · 5506 in / 1478 out tokens · 48657 ms · 2026-05-12T03:21:15.482268+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 3 internal anchors

[1]

Intrinsic dimensionality explains the effectiveness of language model fine-tuning

Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. InACL/IJCNLP (1), pages 7319–7328. Association for Computational Linguistics, 2021

work page 2021
[2]

Retaining key information under high compression ratios: Query-guided compressor for llms

Zhiwei Cao, Qian Cao, Yu Lu, Ningxin Peng, Luyang Huang, Shanbo Cheng, and Jinsong Su. Retaining key information under high compression ratios: Query-guided compressor for llms. InACL (1), pages 12685–12695. Association for Computational Linguistics, 2024

work page 2024
[3]

DAST: context-aware compression in llms via dynamic allocation of soft tokens

Shaoshen Chen, Yangning Li, Zishan Xu, Yongqin Zeng, Shunlong Wu, Xinshuo Hu, Zifei Shan, Xin Su, Jiwei Tang, Yinghui Li, and Hai-Tao Zheng. DAST: context-aware compression in llms via dynamic allocation of soft tokens. InACL (Findings), pages 20544–20552. Association for Computational Linguistics, 2025

work page 2025
[4]

Dialogsum: A real-life scenario dialogue summarization dataset

Yulong Chen, Yang Liu, Liang Chen, and Yue Zhang. Dialogsum: A real-life scenario dialogue summarization dataset. InACL/IJCNLP (Findings), Findings of ACL, pages 5062–5074. Association for Computational Linguistics, 2021

work page 2021
[5]

Adapting language models to compress contexts

Alexis Chevalier, Alexander Wettig, Anirudh Ajith, and Danqi Chen. Adapting language models to compress contexts. InEMNLP, pages 3829–3846. Association for Computational Linguistics, 2023

work page 2023
[6]

Provence: efficient and robust context pruning for retrieval-augmented generation

Nadezhda Chirkova, Thibault Formal, Vassilina Nikoulina, and Stéphane Clinchant. Provence: efficient and robust context pruning for retrieval-augmented generation. InICLR. OpenRe- view.net, 2025

work page 2025
[7]

A spreading activation theory of semantic processing

Allan Collins and Elizabeth Loftus. A spreading activation theory of semantic processing. Psychological Review, 82:407–428, 11 1975. doi: 10.1037//0033-295X.82.6.407

work page doi:10.1037//0033-295x.82.6.407 1975
[8]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas.Elements of information theory (2. ed.). Wiley, 2006

work page 2006
[9]

Pretraining context compressor for large language models with embedding-based memory

Yuhong Dai, Jianxun Lian, Yitian Huang, Wei Zhang, Mingyang Zhou, Mingqi Wu, Xing Xie, and Hao Liao. Pretraining context compressor for large language models with embedding-based memory. InACL (1), pages 28715–28732. Association for Computational Linguistics, 2025

work page 2025
[10]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.CoRR, abs/2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

A survey on in-context learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. A survey on in-context learning. InEMNLP, pages 1107–1128. Association for Computational Linguistics, 2024

work page 2024
[12]

David L. Donoho. Compressed sensing.IEEE Trans. Inf. Theory, 52(4):1289–1306, 2006

work page 2006
[13]

DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs

Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. InNAACL-HLT (1), pages 2368–2378. Association for Computational Linguistics, 2019

work page 2019
[14]

Ugur Güney, V olkan Cirik, and Kyunghyun Cho

Matthew Dunn, Levent Sagun, Mike Higgins, V . Ugur Güney, V olkan Cirik, and Kyunghyun Cho. Searchqa: A new q&a dataset augmented with context from a search engine.CoRR, abs/1704.05179, 2017

work page arXiv 2017
[15]

How contextual are contextualized word representations? comparing the geometry of bert, elmo, and GPT-2 embeddings

Kawin Ethayarajh. How contextual are contextualized word representations? comparing the geometry of bert, elmo, and GPT-2 embeddings. InEMNLP/IJCNLP (1), pages 55–65. Association for Computational Linguistics, 2019

work page 2019
[16]

MRQA 2019 shared task: Evaluating generalization in reading comprehension

Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, and Danqi Chen. MRQA 2019 shared task: Evaluating generalization in reading comprehension. InMRQA@EMNLP, pages 1–13. Association for Computational Linguistics, 2019

work page 2019
[17]

In-context autoencoder for context compression in a large language model

Tao Ge, Jing Hu, Lei Wang, Xun Wang, Si-Qing Chen, and Furu Wei. In-context autoencoder for context compression in a large language model. InICLR. OpenReview.net, 2024. 10

work page 2024
[18]

Gray and David L

Robert M. Gray and David L. Neuhoff. Quantization.IEEE Trans. Inf. Theory, 44(6):2325–2383, 1998

work page 1998
[19]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. InICLR. OpenReview.net, 2022

work page 2022
[20]

Taeho Hwang, Sukmin Cho, Soyeong Jeong, Hoyun Song, SeungYoon Han, and Jong C. Park. EXIT: context-aware extractive compression for enhancing retrieval-augmented generation. In ACL (Findings), pages 4895–4924. Association for Computational Linguistics, 2025

work page 2025
[21]

Llmlingua: Compress- ing prompts for accelerated inference of large language models

Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. Llmlingua: Compress- ing prompts for accelerated inference of large language models. InEMNLP, pages 13358–13376. Association for Computational Linguistics, 2023

work page 2023
[22]

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. InACL (1), pages 1658–1677. Association for Computational Linguistics, 2024

work page 2024
[23]

Weld, and Luke Zettlemoyer

Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. InACL (1), pages 1601–1611. Association for Computational Linguistics, 2017

work page 2017
[24]

Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension

Aniruddha Kembhavi, Min Joon Seo, Dustin Schwenk, Jonghyun Choi, Ali Farhadi, and Hannaneh Hajishirzi. Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension. InCVPR, pages 5376–5384. IEEE Computer Society, 2017

work page 2017
[25]

The narrativeqa reading comprehension challenge.Trans

Tomás Kociský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, and Edward Grefenstette. The narrativeqa reading comprehension challenge.Trans. Assoc. Comput. Linguistics, 6:317–328, 2018

work page 2018
[26]

Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur P. Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: a benchmark for question answering research.Trans...

work page 2019
[27]

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard H. Hovy. RACE: large-scale reading comprehension dataset from examinations. InEMNLP, pages 785–794. Association for Computational Linguistics, 2017

work page 2017
[28]

Zero-shot relation extraction via reading comprehension

Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Zettlemoyer. Zero-shot relation extraction via reading comprehension. InCoNLL, pages 333–342. Association for Computational Linguistics, 2017

work page 2017
[29]

Retrieval-augmented generation for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InNeurIPS, 2020

work page 2020
[30]

Compressing context to enhance inference efficiency of large language models

Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. Compressing context to enhance inference efficiency of large language models. InEMNLP, pages 6342–6353. Association for Computational Linguistics, 2023

work page 2023
[31]

Prompt compression for large language models: A survey

Zongqian Li, Yinhong Liu, Yixuan Su, and Nigel Collier. Prompt compression for large language models: A survey. InNAACL (Long Papers), pages 7182–7195. Association for Computational Linguistics, 2025

work page 2025
[32]

500xcompressor: Generalized prompt compression for large language models

Zongqian Li, Yixuan Su, and Nigel Collier. 500xcompressor: Generalized prompt compression for large language models. InACL (1), pages 25081–25091. Association for Computational Linguistics, 2025. 11

work page 2025
[33]

ROUGE: A package for automatic evaluation of summaries

Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. InText Summariza- tion Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URLhttps://aclanthology.org/W04-1013/

work page 2004
[34]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Trans. Assoc. Comput. Linguistics, 12:157–173, 2024

work page 2024
[35]

Autoencoding-free context compression for llms via contextual semantic anchors.CoRR, abs/2510.08907, 2025

Xin Liu, Runsong Zhao, Pengcheng Huang, Xinyu Liu, Junyi Xiao, Chunyang Xiao, Tong Xiao, Shengxiang Gao, Zhengtao Yu, and Jingbo Zhu. Autoencoding-free context compression for llms via contextual semantic anchors.CoRR, abs/2510.08907, 2025

work page arXiv 2025
[36]

Forgetting curve: A reliable method for evaluating memorization capability for long-context models, 2024

Xinyu Liu, Runsong Zhao, Pengcheng Huang, Chunyang Xiao, Bei Li, Jingang Wang, Tong Xiao, and Jingbo Zhu. Forgetting curve: A reliable method for evaluating memorization capability for long-context models, 2024. URLhttps://arxiv.org/abs/2410.04727

work page arXiv 2024
[37]

Stuart P. Lloyd. Least squares quantization in PCM.IEEE Trans. Inf. Theory, 28(2):129–136, 1982

work page 1982
[38]

Attncomp: Attention-guided adaptive context compres- sion for retrieval-augmented generation

Lvzhou Luo, Yixuan Cao, and Ping Luo. Attncomp: Attention-guided adaptive context compres- sion for retrieval-augmented generation. InEMNLP (Findings), pages 8456–8472. Association for Computational Linguistics, 2025

work page 2025
[39]

Jesse Mu, Xiang Li, and Noah D. Goodman. Learning to compress prompts with gist tokens. In NeurIPS, 2023

work page 2023
[40]

Vicky Zhao, Lili Qiu, and Dongmei Zhang

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, and Dongmei Zhang. Llmlingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. In ACL (Findings), pages 963–981. Association for Computational Linguistics, 2024

work page 2024
[41]

Ross Quillian

M. Ross Quillian. Word concepts: A theory and simulation of some basic semantic capabilities. Behavioral Science, 12(5):410–430, 1967. doi: https://doi.org/10.1002/bs.3830120511. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/bs.3830120511

work page doi:10.1002/bs.3830120511 1967
[42]

Squad: 100, 000+ questions for machine comprehension of text

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100, 000+ questions for machine comprehension of text. InEMNLP, pages 2383–2392. The Association for Computational Linguistics, 2016

work page 2016
[43]

Context em- beddings for efficient answer generation in retrieval-augmented generation

David Rau, Shuai Wang, Hervé Déjean, Stéphane Clinchant, and Jaap Kamps. Context em- beddings for efficient answer generation in retrieval-augmented generation. InProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, WSDM ’25, page 493–502, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 979840071329...

work page doi:10.1145/3701551.3703527 2025
[44]

Khapra, and Karthik Sankaranarayanan

Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, and Karthik Sankaranarayanan. Duorc: Towards complex language understanding with paraphrased reading comprehension. InACL (1), pages 1683–1693. Association for Computational Linguistics, 2018

work page 2018
[45]

Jianlin Su, Murtadha H. M. Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568: 127063, 2024

work page 2024
[46]

Scaling long-horizon LLM agent via context-folding.CoRR, abs/2510.11967, 2025

Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, and Jiecao Chen. Scaling long-horizon LLM agent via context-folding.CoRR, abs/2510.11967, 2025

work page arXiv 2025
[47]

Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph Gonzalez, and Raluca A

Sijun Tan, Xiuyu Li, Shishir G. Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph Gonzalez, and Raluca A. Popa. Lloco: Learning long contexts offline. InEMNLP, pages 17605–17621. Association for Computational Linguistics, 2024

work page 2024
[48]

Perception compressor: A training-free prompt compression framework in long context sce- narios

Jiwei Tang, Jin Xu, Tingwei Lu, Zhicheng Zhang, Yiming Zhao, Lin Hai, and Hai-Tao Zheng. Perception compressor: A training-free prompt compression framework in long context sce- narios. InNAACL (Findings), pages 4093–4108. Association for Computational Linguistics, 2025. 12

work page 2025
[49]

GMSA: enhancing context compression via group merging and layer semantic alignment.CoRR, abs/2505.12215, 2025

Jiwei Tang, Zhicheng Zhang, Shunlong Wu, Jingheng Ye, Lichen Bai, Zitai Wang, Tingwei Lu, Jiaqi Chen, Lin Hai, Hai-Tao Zheng, and Hong-Gee Kim. GMSA: enhancing context compression via group merging and layer semantic alignment.CoRR, abs/2505.12215, 2025

work page arXiv 2025
[50]

Read as human: Compressing context via parallelizable close reading and skimming.CoRR, abs/2602.01840, 2026

Jiwei Tang, Shilei Liu, Zhicheng Zhang, Qingsong Lv, Runsong Zhao, Tingwei Lu, Langming Liu, Haibin Chen, Yujin Yuan, Hai-Tao Zheng, Wenbo Su, and Bo Zheng. Read as human: Compressing context via parallelizable close reading and skimming.CoRR, abs/2602.01840, 2026

work page arXiv 2026
[51]

COMI: coarse-to-fine context compression via marginal information gain.CoRR, abs/2602.01719, 2026

Jiwei Tang, Shilei Liu, Zhicheng Zhang, Yujin Yuan, Libin Zheng, Wenbo Su, and Bo Zheng. COMI: coarse-to-fine context compression via marginal information gain.CoRR, abs/2602.01719, 2026

work page arXiv 2026
[52]

Kimi K2: Open Agentic Intelligence

Kimi Team, Yifan Bai, Yiping Bao, Guanduo Chen, Jiahao Chen, Ningxin Chen, et al. Kimi k2: Open agentic intelligence, 2025. URLhttps://arxiv.org/abs/2507.20534

work page internal anchor Pith review Pith/arXiv arXiv 2025
[53]

Newsqa: A machine comprehension dataset

Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. Newsqa: A machine comprehension dataset. InRep4NLP@ACL, pages 191–200. Association for Computational Linguistics, 2017

work page 2017
[54]

George Tsatsaronis, Georgios Balikas, Prodromos Malakasiotis, Ioannis Partalas, Matthias Zschunke, Michael R. Alvers, Dirk Weissenborn, Anastasia Krithara, Sergios Petridis, Dimitris Polychronopoulos, Yannis Almirantis, John Pavlopoulos, Nicolas Baskiotis, Patrick Gallinari, Thierry Artières, Axel-Cyrille Ngonga Ngomo, Norman Heino, Éric Gaussier, Liliana...

work page 2015
[55]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNIPS, pages 5998–6008, 2017

work page 2017
[56]

Christopher K. I. Williams and Matthias W. Seeger. Using the nyström method to speed up kernel machines. InNIPS, pages 682–688. MIT Press, 2000

work page 2000
[57]

Nyströmformer: A nyström-based algorithm for approximating self-attention

Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh. Nyströmformer: A nyström-based algorithm for approximating self-attention. InAAAI, pages 14138–14148. AAAI Press, 2021

work page 2021
[58]

RECOMP: improving retrieval-augmented lms with context compression and selective augmentation

Fangyuan Xu, Weijia Shi, and Eunsol Choi. RECOMP: improving retrieval-augmented lms with context compression and selective augmentation. InICLR. OpenReview.net, 2024

work page 2024
[59]

Qwen2 Technical Report

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, et al. Qwen2 technical report.CoRR, abs/2407.10671, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[60]

Cohen, Ruslan Salakhut- dinov, and Christopher D

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhut- dinov, and Christopher D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. InEMNLP, pages 2369–2380. Association for Computational Linguistics, 2018

work page 2018
[61]

Compact: Compressing retrieved documents actively for question answering

Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong, and Jaewoo Kang. Compact: Compressing retrieved documents actively for question answering. InEMNLP, pages 21424– 21439. Association for Computational Linguistics, 2024

work page 2024
[62]

Long context compression with activation beacon

Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, and Zhicheng Dou. Long context compression with activation beacon. InICLR. OpenReview.net, 2025

work page 2025
[63]

Weinberger, and Yoav Artzi

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with BERT. InICLR. OpenReview.net, 2020

work page 2020
[64]

SCOPE: A generative approach for LLM prompt compression.CoRR, abs/2508.15813, 2025

Tinghui Zhang, Yifan Wang, and Daisy Zhe Wang. SCOPE: A generative approach for LLM prompt compression.CoRR, abs/2508.15813, 2025. 13

work page arXiv 2025
[65]

Position ids matter: An enhanced position layout for efficient context compression in large language models

Runsong Zhao, Xin Liu, Xinyu Liu, Pengcheng Huang, Chunyang Xiao, Tong Xiao, and JingBo Zhu. Position ids matter: An enhanced position layout for efficient context compression in large language models. InEMNLP (Findings), pages 17715–17734. Association for Computational Linguistics, 2025

work page 2025
[66]

Mooscomp: Improving lightweight long-context compressor via mitigating over- smoothing and incorporating outlier scores.CoRR, abs/2504.16786, 2025

Fengwei Zhou, Jiafei Song, Wenjin Jason Li, Gengjian Xue, Zhikang Zhao, Yichao Lu, and Bailin Na. Mooscomp: Improving lightweight long-context compressor via mitigating over- smoothing and incorporating outlier scores.CoRR, abs/2504.16786, 2025. 14 A Positional Encoding Bias Statement Attention FlowContext Token Position IDLearnable Token 4387654321 Posit...

work page arXiv 2025