hub

Transactions of the association for computational linguistics , volume=

Lost in the middle: How language models use long contexts , author=

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

browse 10 citing papers

hub tools

JSON dossier citing papers JSON

representative citing papers

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

cs.AI · 2026-03-08 · unverdicted · novelty 7.0

GraphBit is a DAG-based engine-orchestrated framework for agentic LLMs that achieves 67.6% accuracy with zero hallucinations on GAIA benchmarks.

OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations

cs.CL · 2026-05-22 · unverdicted · novelty 6.0

OnePred maintains a recursively updated intent memory and uses two-stage RL to predict next queries, cutting token use by up to 22x while outperforming baselines on a new NQP-Bench dataset.

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

GoLongRL releases a 23K-sample open long-context RL dataset spanning 9 tasks and introduces TMN-Reweight to improve multitask optimization, achieving performance comparable to much larger models under GRPO.

When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering

cs.CL · 2026-05-13 · conditional · novelty 6.0

Conflicting biomedical evidence triggers order-dependent prediction flips in RAG LLMs, and a new abstention score combining confidence with conflict detection raises selective accuracy by 7-33 points in the hardest conditions.

When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

Attention to goal tokens declines in multi-turn LLM interactions while residual representations often retain decodable goal information, and the gap between these predicts whether goal-conditioned behavior survives.

ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 3 refs

ProCompNav builds a candidate pool from ambiguous queries then uses pool-splitting binary questions for disambiguation, improving success rate and shortening responses on CoIN-Bench and TextNav.

The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

cs.AI · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.

When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity

cs.AI · 2026-05-19 · unverdicted · novelty 5.0

Re-analysis of MCP-grounded CTF agent runs shows Skills add only 8.9 pp (p=0.71) when tool feedback is strict and low-latency, supporting the hypothesis that high environment-feedback bandwidth reduces the marginal value of curated procedural knowledge.

UserGPT Technical Report

cs.IR · 2026-05-09 · unverdicted · novelty 5.0

UserGPT introduces a generative LLM framework with a behavior simulation engine, semantization module, and DF-GRPO post-training that scores 0.7325 on tag prediction and 0.7528 on summary generation on HPR-Bench while compressing records by up to 97.9%.

citing papers explorer

Showing 10 of 10 citing papers.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding cs.CV · 2026-05-11 · unverdicted · none · ref 67
EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration cs.AI · 2026-03-08 · unverdicted · none · ref 22
GraphBit is a DAG-based engine-orchestrated framework for agentic LLMs that achieves 67.6% accuracy with zero hallucinations on GAIA benchmarks.
OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations cs.CL · 2026-05-22 · unverdicted · none · ref 23
OnePred maintains a recursively updated intent memory and uses two-stage RL to predict next queries, cutting token use by up to 22x while outperforming baselines on a new NQP-Bench dataset.
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment cs.CL · 2026-05-19 · unverdicted · none · ref 53
GoLongRL releases a 23K-sample open long-context RL dataset spanning 9 tasks and introduces TMN-Reweight to improve multitask optimization, achieving performance comparable to much larger models under GRPO.
When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering cs.CL · 2026-05-13 · conditional · none · ref 4
Conflicting biomedical evidence triggers order-dependent prediction flips in RAG LLMs, and a new abstention score combining confidence with conflict detection raises selective accuracy by 7-33 points in the hardest conditions.
When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction cs.AI · 2026-05-13 · unverdicted · none · ref 3
Attention to goal tokens declines in multi-turn LLM interactions while residual representations often retain decodable goal information, and the gap between these predicts whether goal-conditioned behavior survives.
ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries cs.AI · 2026-05-07 · unverdicted · none · ref 44 · 3 links
ProCompNav builds a candidate pool from ambiguous queries then uses pool-splitting binary questions for disambiguation, improving success rate and shortening responses on CoIN-Bench and TextNav.
The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling cs.AI · 2026-05-04 · unverdicted · none · ref 21 · 2 links
APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.
When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity cs.AI · 2026-05-19 · unverdicted · none · ref 9
Re-analysis of MCP-grounded CTF agent runs shows Skills add only 8.9 pp (p=0.71) when tool feedback is strict and low-latency, supporting the hypothesis that high environment-feedback bandwidth reduces the marginal value of curated procedural knowledge.
UserGPT Technical Report cs.IR · 2026-05-09 · unverdicted · none · ref 35
UserGPT introduces a generative LLM framework with a behavior simulation engine, semantization module, and DF-GRPO post-training that scores 0.7325 on tag prediction and 0.7528 on summary generation on HPR-Bench while compressing records by up to 97.9%.

Transactions of the association for computational linguistics , volume=

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer