EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.
Rossi, Seunghyun Yoon, and Hinrich Schütze
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6representative citing papers
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
Tool Attention cuts tool-related tokens by 95% and raises context utilization from 24% to 91% in a 120-tool simulation via dynamic gating and lazy loading.
Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.
MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.
citing papers explorer
-
Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing
EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.
-
Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
-
Retrieval from Within: An Intrinsic Capability of Attention-Based Models
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
-
Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows
Tool Attention cuts tool-related tokens by 95% and raises context utilization from 24% to 91% in a 120-tool simulation via dynamic gating and lazy loading.
-
Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings
Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.
-
MiMo-V2-Flash Technical Report
MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.