IE-as-Cache framework repurposes information extraction as a dynamic cognitive cache to improve agentic reasoning accuracy in LLMs on challenging benchmarks.
Qmsum: A new benchmark for query-based multi-domain meeting summarization
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
The thesis presents a kernel method for multiaccuracy across overlooked subpopulations, information-theoretic optimal watermarking for LLMs, and a simulator showing LLM agents outperforming humans in supply chains while creating tail risks.
Ada-KV is the first head-wise adaptive KV cache budget allocator for LLMs, using a theoretical loss upper bound to allocate eviction differently per attention head and yielding higher quality than uniform methods on long-context benchmarks.
SnapKV selects clustered important KV positions per attention head from an observation window at the prompt end, yielding 3.6x faster generation and 8.2x better memory efficiency on 16K-token inputs with comparable performance across 16 datasets.
RetNet is a new sequence modeling architecture that delivers parallel training, constant-time inference, and competitive language modeling performance as a potential replacement for Transformers.
K-VEC is a coverage-aware KV-cache eviction strategy using cross-head and cross-layer modules that improves performance by up to 10.35 points over prior methods on LongBench subsets at fixed memory budget.
E2LLM uses encoder-based soft prompt compression for long contexts to improve LLM reasoning on tasks like summarization and QA while maintaining efficiency.
A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.
citing papers explorer
-
SnapKV: LLM Knows What You are Looking for Before Generation
SnapKV selects clustered important KV positions per attention head from an observation window at the prompt end, yielding 3.6x faster generation and 8.2x better memory efficiency on 16K-token inputs with comparable performance across 16 datasets.
-
Retrieval-Augmented Generation for Large Language Models: A Survey
A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.