Imprint compresses egocentric observations into interaction patterns via online memory compression, raising QA accuracy from 31.0% to 35.8% while cutting memory 2.3× and latency 11.8× on a seven-day benchmark.
Retrieval-augmented generation with hierarchical knowledge.arXiv preprint arXiv:2503.10150
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9representative citing papers
DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.
ASTRA-QA is a benchmark for abstract document question answering that uses explicit topic sets, unsupported content annotations, and evidence alignments to enable direct scoring of coverage and hallucination.
SkillRAE organizes skills into a graph and compiles compact, grounded contexts for LLM agents, yielding 11.7% gains on SkillsBench over prior RAE methods.
MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.
AtlasKV integrates billion-scale KGs into LLMs parametrically with sub-linear complexity and low memory by converting triples into key-value representations handled by the model's attention.
Ψ-RAG improves cross-document multi-hop QA performance using an adaptive hierarchical abstract tree and agent-powered hybrid retrieval, outperforming RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1.
R²-Searcher introduces fine-grained evidence modeling, retrieval reflection, and R²PO RL to calibrate retrieval-reasoning boundaries and improve multi-hop QA performance.
LLM-driven feature synthesis from data-rich verticals improves MTL ranking models in data-sparse verticals via taxonomic features from user histories.
citing papers explorer
-
ASTRA-QA: A Benchmark for Abstract Question Answering over Documents
ASTRA-QA is a benchmark for abstract document question answering that uses explicit topic sets, unsupported content annotations, and evidence alignments to enable direct scoring of coverage and hallucination.