{"total":11,"items":[{"citing_arxiv_id":"2605.23296","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Parallel Context Compaction for Long-Horizon LLM Agent Serving","primary_cat":"cs.AI","submitted_at":"2026-05-22T07:12:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Parallel compaction for LLM agent context management provides predictable volume control and reduces wall time versus sequential baselines on HotpotQA and LoCoMo.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23170","ref_index":37,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks","primary_cat":"cs.CL","submitted_at":"2026-05-22T02:42:41+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Audits reveal no reasoning benchmark controls position/filler/length jointly; CRE shows LLMs drop up to 88pp on middle-position tasks at 64K context, with diagnostic probe supporting positional cause.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10544","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing","primary_cat":"cs.CL","submitted_at":"2026-05-11T13:23:21+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08580","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents","primary_cat":"cs.MA","submitted_at":"2026-05-09T00:47:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05806","ref_index":24,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Retrieval from Within: An Intrinsic Capability of Attention-Based Models","primary_cat":"cs.LG","submitted_at":"2026-05-07T07:42:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[23] Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Richard James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, and Scott Yih. RA- DIT: Retrieval-augmented dual instruction tuning. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=22OTbutug9. [24] Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, Se- unghyun Yoon, and Hinrich Schütze. NoLiMa: Long-Context Evaluation Beyond Literal Matching, 2025. URLhttps://arxiv.org/abs/2502.05167. [25] Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y . Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher Ré."},{"citing_arxiv_id":"2604.21816","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows","primary_cat":"cs.AI","submitted_at":"2026-04-23T16:10:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Tool Attention cuts tool-related tokens by 95% and raises context utilization from 24% to 91% in a 120-tool simulation via dynamic gating and lazy loading.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05151","ref_index":89,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings","primary_cat":"cs.CY","submitted_at":"2026-04-06T20:25:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"and special-knowledge benchmarks is well-acknowledged [31] even if the evaluation ecosystem is benchmarking inapt tasks and often incorporated into training data [80]. As the capabilities claimed by GenAI platforms has expanded, there has been a corresponding interest in ensuring that these tools work well for specific tasks, producing reliable outputs germane to the individual users' con- texts [89, 112]. A great deal of GenAI's capabilities come from being able to process linguistic tokens as lengthy, complex textual passages [15] and produce outputs that are relevant to the input. However, producing outputs that take into account aspects of spe- cific users' contexts, which frame their inputs in ways that matter ∗Corresponding Author: emanuel."},{"citing_arxiv_id":"2603.13091","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Reasoning over Video: Evaluating How MLLMs Extract, Integrate, and Reconstruct Spatiotemporal Evidence","primary_cat":"cs.CV","submitted_at":"2026-03-13T15:40:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"VAEX-BENCH shows state-of-the-art MLLMs perform substantially worse on abstractive spatiotemporal reasoning tasks than on matched extractive tasks in video understanding.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.21468","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning","primary_cat":"cs.AI","submitted_at":"2026-01-29T09:47:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MemOCR renders structured memory as images with adaptive visual density to improve long-horizon reasoning under tight context budgets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.02780","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MiMo-V2-Flash Technical Report","primary_cat":"cs.CL","submitted_at":"2026-01-06T07:31:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.05257","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions","primary_cat":"cs.CL","submitted_at":"2025-07-07T17:59:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}