Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
Attention to goal tokens declines in multi-turn LLM interactions while residual representations often retain decodable goal information, and the gap between these predicts whether goal-conditioned behavior survives.
TTS adapts speculator models online via target model verifications to improve acceptance lengths by up to 72% over prior methods, with gains increasing for longer generations.
DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.
citing papers explorer
-
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.
-
When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction
Attention to goal tokens declines in multi-turn LLM interactions while residual representations often retain decodable goal information, and the gap between these predicts whether goal-conditioned behavior survives.
-
Test-Time Speculation
TTS adapts speculator models online via target model verifications to improve acceptance lengths by up to 72% over prior methods, with gains increasing for longer generations.
-
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.
- Simplified Sparse Attention via Gist Tokens