hub

and Yoon, Seunghyun and Sch

Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A · 2025 · arXiv 2502.05167

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

ContextEcho benchmark shows persona drift occurs across 23 frontier models in long agentic-coding sessions, is not reliably reset by compaction, and can be restored by single-shot anchors with mode-dependent effects.

Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

cs.CL · 2026-05-22 · conditional · novelty 7.0

Audits reveal no reasoning benchmark controls position/filler/length jointly; CRE shows LLMs drop up to 88pp on middle-position tasks at 64K context, with diagnostic probe supporting positional cause.

Reasoning over Video: Evaluating How MLLMs Extract, Integrate, and Reconstruct Spatiotemporal Evidence

cs.CV · 2026-03-13 · unverdicted · novelty 7.0

VAEX-BENCH shows state-of-the-art MLLMs perform substantially worse on abstractive spatiotemporal reasoning tasks than on matched extractive tasks in video understanding.

Separating Semantic Competition from Context Length in RAG Reading

cs.CL · 2026-05-26 · unverdicted · novelty 6.0

A matched-control protocol on SQuAD recovers 4.5-6 EM points and 0.057-0.068 F1 in two small models by swapping hard distractors for easier ones, isolating semantic competition from context length.

Parallel Context Compaction for Long-Horizon LLM Agent Serving

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

Parallel compaction for LLM agent context management provides predictable volume control and reduces wall time versus sequential baselines on HotpotQA and LoCoMo.

Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing

cs.CL · 2026-05-11 · conditional · novelty 6.0

EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents

cs.MA · 2026-05-09 · unverdicted · novelty 6.0

Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.

Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces

cs.CL · 2026-06-05 · unverdicted · novelty 5.0

Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.

Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

cs.AI · 2026-04-23 · unverdicted · novelty 5.0

Tool Attention cuts tool-related tokens by 95% and raises context utilization from 24% to 91% in a 120-tool simulation via dynamic gating and lazy loading.

Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings

cs.CY · 2026-04-06 · unverdicted · novelty 5.0

Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

cs.AI · 2026-01-29 · unverdicted · novelty 5.0

MemOCR renders structured memory as images with adaptive visual density to improve long-horizon reasoning under tight context budgets.

MiMo-V2-Flash Technical Report

cs.CL · 2026-01-06 · unverdicted · novelty 5.0

MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

cs.CL · 2025-07-07

citing papers explorer

Showing 2 of 2 citing papers after filters.

Retrieval from Within: An Intrinsic Capability of Attention-Based Models cs.LG · 2026-05-07 · unverdicted · none · ref 24 · 2 links
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings cs.CY · 2026-04-06 · unverdicted · none · ref 89
Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.

and Yoon, Seunghyun and Sch

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer