hub

Lost in the Middle: How Language Models Use Long Contexts

[Online] · 2023 · cs.CL · DOI 10.1162/tacl · arXiv 2307.03172

85 Pith papers cite this work. Polarity classification is still indexing.

85 Pith papers citing it

open full Pith review browse 85 citing papers arXiv PDF

abstract

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts. In particular, we observe that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts, even for explicitly long-context models. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.

hub tools

JSON dossier citing papers JSON publisher DOI arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

claims ledger

abstract While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts. In particular, we observe that performance is often highest

co-cited works

representative citing papers

Submodular Ground-Set Pruning: Monotone Tightness and a Non-Monotone Separation

cs.DS · 2026-05-06 · unverdicted · novelty 8.0

For monotone submodular maximization, containment pruning has a tight 1-1/e factor; for non-monotone objectives, 1/2-ε algorithms exist that exceed known optimization hardness bounds.

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

cs.CL · 2023-10-10 · unverdicted · novelty 8.0

SWE-bench reveals that even top language models like Claude 2 resolve only 1.96% of 2,294 real-world GitHub issues, highlighting a gap in practical coding capabilities.

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

cs.CL · 2023-08-28 · unverdicted · novelty 8.0

LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).

Agentic Interpretation: Lattice-Structured Evidence for LLM-Based Program Analysis

cs.SE · 2026-05-12 · unverdicted · novelty 7.0

Agentic interpretation uses lattices to track LLM judgments on decomposed program claims during analysis.

Measuring What Matters Beyond Text: Evaluating Multimodal Summaries by Quality, Alignment, and Diversity

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

MM-Eval unifies evaluation of multimodal summaries by integrating factual text quality, cross-modal relevance via MLLM judge, and visual diversity via truncated CLIP entropy, then calibrates their combination on human preferences.

Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection

cs.CR · 2026-05-12 · unverdicted · novelty 7.0

Mobius Injection exploits semantic closure in LLM agents to enable single-message AbO-DDoS attacks achieving up to 51x call amplification and 229x latency inflation.

Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.

Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

Concept-based abductive and contrastive explanations find minimal high-level concepts that causally determine vision model outcomes on individual images or groups sharing a specified behavior.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation

cs.CL · 2026-05-04 · unverdicted · novelty 7.0

AdaGATE improves evidence F1 scores on HotpotQA for multi-hop RAG under clean, redundant, and noisy conditions by framing selection as gap-aware token-constrained repair, outperforming baselines while using 2.6x fewer tokens.

Don't Be a Pot Stirrer! Authorized Vector Data Retrieval via Access-Aware Indexing

cs.DB · 2026-05-02 · conditional · novelty 7.0 · 2 refs

Veda and EffVeda partition vectors into disjoint role-combination blocks, apply lattice-based copy and merge operations within a storage budget, index large nodes with HNSW, and use coordinated search with distance bounds to deliver higher throughput at high recall.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

cs.CL · 2026-04-29 · unverdicted · novelty 7.0

OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

LLMs exhibit positional bias and context-dependent scoring patterns when judging document similarity, with each model showing a stable scoring fingerprint but a shared hierarchy of sensitivity to different semantic perturbations.

Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations

cs.CL · 2026-04-17 · unverdicted · novelty 7.0

Internal layer-wise entropy reshaping provides nonconformity scores that improve the validity-efficiency trade-off of conformal prediction for LLMs under cross-domain shift compared to text-level baselines.

Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

Spiking attention is a universal approximator of permutation-equivariant functions with ε-approximation requiring Ω(L_f² nd / ε²) spikes, but low effective dimensions (47-89) allow T=4 timesteps in practice.

IE as Cache: Information Extraction Enhanced Agentic Reasoning

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

IE-as-Cache framework repurposes information extraction as a dynamic cognitive cache to improve agentic reasoning accuracy in LLMs on challenging benchmarks.

In-Context Learning in Speech Language Models: Analyzing the Role of Acoustic Features, Linguistic Structure, and Induction Heads

cs.CL · 2026-04-07 · unverdicted · novelty 7.0

Speech language models show in-context learning where speaking rate affects both accuracy and mimicry, and induction heads are causally necessary for this capability.

MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration

cond-mat.mtrl-sci · 2026-04-03 · conditional · novelty 7.0

MatClaw is a code-first LLM agent that autonomously executes end-to-end materials workflows by generating and running Python scripts on remote clusters, achieving reliable code generation via memory architecture and RAG while requiring guided interventions for tacit knowledge.

Internalized Reasoning for Long-Context Visual Document Understanding

cs.CV · 2026-03-31 · unverdicted · novelty 7.0

A synthetic pipeline creates and internalizes reasoning traces in VLMs for long-context visual document understanding, with a 32B model surpassing a 235B model on MMLongBenchDoc and showing 12.4x fewer output tokens.

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

cs.CL · 2024-10-14 · unverdicted · novelty 7.0

LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.

Moshi: a speech-text foundation model for real-time dialogue

eess.AS · 2024-09-17 · accept · novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

cs.CL · 2024-05-07 · unverdicted · novelty 7.0

DeepSeek-V2 delivers top-tier open-source LLM performance using only 21B active parameters by compressing the KV cache 93.3% and cutting training costs 42.5% via MLA and DeepSeekMoE.

citing papers explorer

Showing 50 of 85 citing papers.

Submodular Ground-Set Pruning: Monotone Tightness and a Non-Monotone Separation cs.DS · 2026-05-06 · unverdicted · none · ref 24 · internal anchor
For monotone submodular maximization, containment pruning has a tight 1-1/e factor; for non-monotone objectives, 1/2-ε algorithms exist that exceed known optimization hardness bounds.
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? cs.CL · 2023-10-10 · unverdicted · none · ref 114 · internal anchor
SWE-bench reveals that even top language models like Claude 2 resolve only 1.96% of 2,294 real-world GitHub issues, highlighting a gap in practical coding capabilities.
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding cs.CL · 2023-08-28 · unverdicted · none · ref 100 · internal anchor
LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
Agentic Interpretation: Lattice-Structured Evidence for LLM-Based Program Analysis cs.SE · 2026-05-12 · unverdicted · none · ref 28 · internal anchor
Agentic interpretation uses lattices to track LLM judgments on decomposed program claims during analysis.
Measuring What Matters Beyond Text: Evaluating Multimodal Summaries by Quality, Alignment, and Diversity cs.AI · 2026-05-12 · unverdicted · none · ref 199 · internal anchor
MM-Eval unifies evaluation of multimodal summaries by integrating factual text quality, cross-modal relevance via MLLM judge, and visual diversity via truncated CLIP entropy, then calibrates their combination on human preferences.
Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection cs.CR · 2026-05-12 · unverdicted · none · ref 40 · internal anchor
Mobius Injection exploits semantic closure in LLM agents to enable single-message AbO-DDoS attacks achieving up to 51x call amplification and 229x latency inflation.
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory cs.AI · 2026-05-11 · unverdicted · none · ref 23 · internal anchor
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models cs.LG · 2026-05-07 · unverdicted · none · ref 4 · internal anchor
Concept-based abductive and contrastive explanations find minimal high-level concepts that causally determine vision model outcomes on individual images or groups sharing a specified behavior.
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States cs.CL · 2026-05-06 · unverdicted · none · ref 75 · internal anchor
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation cs.CL · 2026-05-04 · unverdicted · none · ref 23 · internal anchor
AdaGATE improves evidence F1 scores on HotpotQA for multi-hop RAG under clean, redundant, and noisy conditions by framing selection as gap-aware token-constrained repair, outperforming baselines while using 2.6x fewer tokens.
Don't Be a Pot Stirrer! Authorized Vector Data Retrieval via Access-Aware Indexing cs.DB · 2026-05-02 · conditional · none · ref 14 · 2 links · internal anchor
Veda and EffVeda partition vectors into disjoint role-combination blocks, apply lattice-based copy and merge operations within a storage budget, index large nodes with HNSW, and use coordinated search with distance bounds to deliver higher throughput at high recall.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 51 · internal anchor
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory cs.CL · 2026-04-29 · unverdicted · none · ref 12 · internal anchor
OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 86 · internal anchor
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring cs.CL · 2026-04-20 · unverdicted · none · ref 55 · internal anchor
LLMs exhibit positional bias and context-dependent scoring patterns when judging document similarity, with each model showing a stable scoring fingerprint but a shared hierarchy of sensitivity to different semantic perturbations.
Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations cs.CL · 2026-04-17 · unverdicted · none · ref 4 · internal anchor
Internal layer-wise entropy reshaping provides nonconformity scores that improve the validity-efficiency trade-off of conformal prediction for LLMs under cross-domain shift compared to text-level baselines.
Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension cs.LG · 2026-04-17 · unverdicted · none · ref 18 · internal anchor
Spiking attention is a universal approximator of permutation-equivariant functions with ε-approximation requiring Ω(L_f² nd / ε²) spikes, but low effective dimensions (47-89) allow T=4 timesteps in practice.
IE as Cache: Information Extraction Enhanced Agentic Reasoning cs.CL · 2026-04-16 · unverdicted · none · ref 14 · internal anchor
IE-as-Cache framework repurposes information extraction as a dynamic cognitive cache to improve agentic reasoning accuracy in LLMs on challenging benchmarks.
In-Context Learning in Speech Language Models: Analyzing the Role of Acoustic Features, Linguistic Structure, and Induction Heads cs.CL · 2026-04-07 · unverdicted · none · ref 12 · internal anchor
Speech language models show in-context learning where speaking rate affects both accuracy and mimicry, and induction heads are causally necessary for this capability.
MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration cond-mat.mtrl-sci · 2026-04-03 · conditional · none · ref 13 · internal anchor
MatClaw is a code-first LLM agent that autonomously executes end-to-end materials workflows by generating and running Python scripts on remote clusters, achieving reliable code generation via memory architecture and RAG while requiring guided interventions for tacit knowledge.
Internalized Reasoning for Long-Context Visual Document Understanding cs.CV · 2026-03-31 · unverdicted · none · ref 30 · internal anchor
A synthetic pipeline creates and internalizes reasoning traces in VLMs for long-context visual document understanding, with a 32B model surpassing a 235B model on MMLongBenchDoc and showing 12.4x fewer output tokens.
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory cs.CL · 2024-10-14 · unverdicted · none · ref 78 · internal anchor
LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.
Moshi: a speech-text foundation model for real-time dialogue eess.AS · 2024-09-17 · accept · none · ref 47 · internal anchor
Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model cs.CL · 2024-05-07 · unverdicted · none · ref 140 · internal anchor
DeepSeek-V2 delivers top-tier open-source LLM performance using only 21B active parameters by compressing the KV cache 93.3% and cutting training costs 42.5% via MLA and DeepSeekMoE.
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation cs.CL · 2024-02-05 · unverdicted · none · ref 26 · internal anchor
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.
Extending Context Window of Large Language Models via Positional Interpolation cs.CL · 2023-06-27 · conditional · none · ref 9 · internal anchor
Position Interpolation linearly down-scales position indices to extend RoPE context windows to 32768 tokens with 1000-step fine-tuning, delivering strong long-context results on LLaMA 7B-65B while preserving short-context quality.
OPT: Open Pre-trained Transformer Language Models cs.CL · 2022-05-02 · unverdicted · none · ref 234 · internal anchor
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
Multitask Prompted Training Enables Zero-Shot Task Generalization cs.LG · 2021-10-15 · conditional · none · ref 2 · internal anchor
Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
MMSkills: Towards Multimodal Skills for General Visual Agents cs.AI · 2026-05-13 · unverdicted · none · ref 17 · internal anchor
MMSkills turns public interaction trajectories into compact multimodal skill packages that visual agents can consult at runtime to improve decision-making on benchmarks.
LISA: Cognitive Arbitration for Signal-Free Autonomous Intersection Management cs.AI · 2026-05-12 · unverdicted · none · ref 42 · internal anchor
LISA applies LLMs as primary decision-makers for signal-free intersection management, cutting mean control delay by up to 89.1% and maintaining better service levels than fixed-cycle, SCATS, AIM, or GLOSA baselines.
Do Language Models Encode Knowledge of Linguistic Constraint Violations? cs.CL · 2026-05-12 · unverdicted · none · ref 12 · internal anchor
Sparse autoencoder analysis of language model activations finds limited evidence for a unified set of features detecting linguistic constraint violations.
Towards Visually Grounded Multimodal Summarization via Cross-Modal Transformer and Gated Attention cs.AI · 2026-05-12 · unverdicted · none · ref 198 · internal anchor
SPeCTrA-Sum uses hierarchical cross-modal fusion via DVP and DPP-distilled image selection via VRP to generate more accurate and visually grounded multimodal summaries.
Instructions Shape Production of Language, not Processing cs.CL · 2026-05-11 · unverdicted · none · ref 152 · 2 links · internal anchor
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
Adversarial SQL Injection Generation with LLM-Based Architectures cs.CR · 2026-05-11 · unverdicted · none · ref 22 · internal anchor
RADAGAS-GPT4o achieves a 22.73% bypass rate against 10 WAFs, succeeding more against AI/ML-based firewalls than rule-based ones.
Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis cs.CL · 2026-05-11 · unverdicted · none · ref 17 · 2 links · internal anchor
DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.
Bias by Necessity: Impossibility Theorems for Sequential Processing with Convergent AI and Human Validation cs.AI · 2026-05-09 · unverdicted · none · ref 18 · internal anchor
Primacy, anchoring, and order-dependence are architecturally necessary in autoregressive models due to causal masking constraints, with supporting evidence from theorems, LLM fits, and human experiments.
Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents cs.MA · 2026-05-09 · unverdicted · none · ref 14 · 2 links · internal anchor
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
The Position Curse: LLMs Struggle to Locate the Last Few Items in a List cs.LG · 2026-05-08 · unverdicted · none · ref 14 · internal anchor
LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.
A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks cs.LG · 2026-05-06 · unverdicted · none · ref 35 · internal anchor
A dual-purpose benchmark supplies two text-derived knowledge graphs and one expert reference graph on the same biomedical corpus to jointly measure construction method quality and GNN robustness via semi-supervised node classification.
Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning cs.CV · 2026-05-06 · unverdicted · none · ref 9 · internal anchor
IPL alternates discrete semantic token selection using approximate submodular optimization with continuous prompt optimization to boost both interpretability and task performance in vision-language model adaptation.
Focus and Dilution: The Multi-stage Learning Process of Attention cs.LG · 2026-05-02 · unverdicted · none · ref 1 · internal anchor
In one-layer Transformers trained on Markovian data, attention undergoes a cycle of rapid rank-one condensation, frequency-driven focus on high-frequency tokens, dilution via embedding perturbations, and restart from low-frequency asymmetries.
M-CaStLe: Uncovering Local Causal Structures in Multivariate Space-Time Gridded Data cs.LG · 2026-05-01 · unverdicted · none · ref 261 · internal anchor
M-CaStLe generalizes local stencil-based causal discovery to the multivariate case and decomposes resulting graphs into reaction and spatial components for interpretation in space-time gridded data.
Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation cs.CL · 2026-05-01 · unverdicted · none · ref 9 · internal anchor
STC reduces tabular chunk counts by up to 56% versus baselines and raises hybrid MRR to 0.5945 and BM25 Recall@1 to 0.754 by preserving row structure during chunking.
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction cs.AI · 2026-04-30 · unverdicted · none · ref 8 · internal anchor
Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
NuggetIndex: Governed Atomic Retrieval for Maintainable RAG cs.IR · 2026-04-30 · unverdicted · none · ref 27 · internal anchor
NuggetIndex manages atomic nuggets with temporal validity and lifecycle metadata to filter outdated information before ranking, yielding 42% higher nugget recall, 9pp better temporal correctness, and 55% fewer conflicts than passage or unmanaged proposition baselines.
PRAG: End-to-End Privacy-Preserving Retrieval-Augmented Generation cs.CR · 2026-04-29 · unverdicted · none · ref 33 · internal anchor
PRAG delivers end-to-end private RAG with 72-74% recall via non-interactive homomorphic approximations, interactive client assistance, and operation-error estimation to preserve ranking quality.
HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models cs.LG · 2026-04-24 · unverdicted · none · ref 24 · internal anchor
HubRouter is a sub-quadratic routing primitive using learned hubs that replaces attention layers in hybrid models while delivering competitive perplexity and large throughput gains.
Dissociating Decodability and Causal Use in Bracket-Sequence Transformers cs.CL · 2026-04-24 · unverdicted · none · ref 2 · internal anchor
In Dyck-language transformers, attention patterns causally use top-of-stack information while residual-stream depth and distance signals are decodable yet causally inert.
R$^3$AG: Retriever Routing for Retrieval-Augmented Generation cs.IR · 2026-04-22 · unverdicted · none · ref 23 · internal anchor
R³AG routes queries to retrievers by decomposing capabilities into retrieval quality and generation utility, trained via contrastive learning on document assessments and downstream answer correctness to outperform static methods.
Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents cs.CR · 2026-04-22 · unverdicted · none · ref 1 · internal anchor
Omission constraints in LLM agents decay with conversation length while commission constraints remain stable, creating an invisible security failure.

Lost in the Middle: How Language Models Use Long Contexts

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer