super hub Baseline reference

RULER: What's the Real Context Size of Your Long-Context Language Models?

Cheng-Ping Hsieh, Dima Rekesh, Fei Jia, Samuel Kriman, Shantanu Acharya, Simeng Sun · 2024 · cs.CL · arXiv 2404.06654

Baseline reference. 57% of citing Pith papers use this work as a benchmark or comparison.

146 Pith papers citing it

Baseline 57% of classified citations

open full Pith review browse 146 citing papers more from Cheng-Ping Hsieh arXiv PDF

abstract

The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is indicative of only a superficial form of long-context understanding. To provide a more comprehensive evaluation of long-context LMs, we create a new synthetic benchmark RULER with flexible configurations for customized sequence length and task complexity. RULER expands upon the vanilla NIAH test to encompass variations with diverse types and quantities of needles. Moreover, RULER introduces new task categories multi-hop tracing and aggregation to test behaviors beyond searching from context. We evaluate 17 long-context LMs with 13 representative tasks in RULER. Despite achieving nearly perfect accuracy in the vanilla NIAH test, almost all models exhibit large performance drops as the context length increases. While these models all claim context sizes of 32K tokens or greater, only half of them can maintain satisfactory performance at the length of 32K. Our analysis of Yi-34B, which supports context length of 200K, reveals large room for improvement as we increase input length and task complexity. We open source RULER to spur comprehensive evaluation of long-context LMs.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 16 background 12

citation-polarity summary

use dataset 16 background 12

claims ledger

abstract The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is indicative of only a superficial form of long-context understanding. To provide a more comprehensive evaluation of long-context LMs, we create a new synthetic benchmark RULER with flexible configurations for customized sequence length and task complexity. RULER expands upon the vanilla NIAH test to encompass variations wi

authors

Cheng-Ping Hsieh Dima Rekesh Fei Jia Samuel Kriman Shantanu Acharya Simeng Sun

co-cited works

representative citing papers

EntmaxKV: Support-Aware Decoding for Entmax Attention

cs.LG · 2026-05-20 · conditional · novelty 8.0

EntmaxKV enables exact sparse KV-cache decoding for entmax attention via support-aware page selection and a Gaussian threshold estimator, matching full attention quality at a fraction of the cache size with up to 5.43x speedup.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

Recursive Language Models

cs.AI · 2025-12-31 · conditional · novelty 8.0

RLMs allow LLMs to handle prompts up to 100x longer than their context window via recursive self-calls on prompt parts, outperforming standard long-context methods on benchmarks.

Morphing into Hybrid Attention Models

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

FlashMorph formulates hybrid layer selection as budget-constrained optimization, trains per-layer gates on synthetic retrieval data with linearization regularization, then discretizes and distills to produce efficient hybrid architectures.

CARVE: Content-Aware Recurrent with Value Efficiency for Chunk-Parallel Linear Attention

cs.CL · 2026-06-25 · conditional · novelty 7.0

CARVE introduces key-axis content-aware gating and value-efficient scalar writes in recurrent linear attention, outperforming GDN-2 on perplexity and retrieval tasks while cutting parameters and memory.

Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention

cs.LG · 2026-06-01 · conditional · novelty 7.0

Fixed block causal masks create reachability boundaries where representations depend only on block prefixes, formalized via dependency sets and phase-conditioned coverage functions, with a parameter-free boundary bridge repair.

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

cs.AI · 2026-05-29 · unverdicted · novelty 7.0

Proposes the Intelligent Computing Architecture (ICA) as a six-layer framework with dual probabilistic-deterministic planes and three Amdahl-style heuristics to unify design of LLM-based systems.

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems

cs.AI · 2026-05-25 · unverdicted · novelty 7.0

AgingBench demonstrates multi-dimensional degradation in deployed AI agents through four aging mechanisms diagnosed by temporal graphs and counterfactual probes across hundreds of runs.

ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

ContextEcho benchmark shows persona drift occurs across 23 frontier models in long agentic-coding sessions, is not reliably reset by compaction, and can be restored by single-shot anchors with mode-dependent effects.

Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

cs.CL · 2026-05-22 · conditional · novelty 7.0

Audits reveal no reasoning benchmark controls position/filler/length jointly; CRE shows LLMs drop up to 88pp on middle-position tasks at 64K context, with diagnostic probe supporting positional cause.

ASSEMBLAGE-DEEPHISTORY: A Cross-Build Binary Dataset with Temporal Coverage

cs.CR · 2026-05-20 · unverdicted · novelty 7.0

A new queryable binary dataset combining cross-build diversity, temporal history, and CVE labels with linked metadata for vulnerability research.

MemGym: a Long-Horizon Memory Environment for LLM Agents

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

MemGym unifies agent gyms into a memory benchmark with isolated scoring across tool-use, research, coding, and computer-use regimes plus a lightweight reward model for tractable coding evaluation.

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

DecisionBench supplies a fixed task suite, model pool, delegation interface, and multi-axis metrics to evaluate emergent delegation, showing similar quality across awareness conditions but 15-31 point headroom under perfect delegation.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

MEME: Multi-entity & Evolving Memory Evaluation

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

All tested LLM memory systems fail at dependency reasoning in multi-entity evolving scenarios, with only an expensive file-based setup showing partial recovery.

ProactBench: Beyond What The User Asked For

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.

VORT: Adaptive Power-Law Memory for NLP Transformers

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

VORT assigns learnable fractional orders to tokens and approximates their power-law retention kernels via sum-of-exponentials for efficient long-range dependency modeling in transformers.

When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.

The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

LLM information retrieval shows a U-shaped performance drop as words are fragmented by inserted whitespace, attributed to a disordered transition between word-level and character-level processing modes.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

HUGO-CS: A Hybrid-Labeled, Uncertainty-Aware, General-Purpose, Observational Dataset for Cold Spray

cs.LG · 2026-05-05 · accept · novelty 7.0

HUGO-CS is a 4,383-experiment cold-spray dataset extracted from literature via a new hybrid LLM-manual framework that is 30 times larger than prior collections and released with code.

Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference

cs.AI · 2026-05-01 · unverdicted · novelty 7.0

TokenArena is a continuous benchmark for AI inference endpoints that measures output speed, time to first token, blended price, effective context, quality, and modeled energy to produce composites of joules per correct answer, dollars per correct answer, and endpoint fidelity.

PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

cs.LG · 2026-04-23 · unverdicted · novelty 7.0

Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0 · 2 refs

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

citing papers explorer

Showing 48 of 48 citing papers after filters.

EntmaxKV: Support-Aware Decoding for Entmax Attention cs.LG · 2026-05-20 · conditional · none · ref 8 · internal anchor
EntmaxKV enables exact sparse KV-cache decoding for entmax attention via support-aware page selection and a Gaussian threshold estimator, matching full attention quality at a fraction of the cache size with up to 5.43x speedup.
Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention cs.LG · 2026-06-01 · conditional · none · ref 11 · internal anchor
Fixed block causal masks create reachability boundaries where representations depend only on block prefixes, formalized via dependency sets and phase-conditioned coverage functions, with a parameter-free boundary bridge repair.
MEME: Multi-entity & Evolving Memory Evaluation cs.LG · 2026-05-12 · unverdicted · none · ref 4 · internal anchor
All tested LLM memory systems fail at dependency reasoning in multi-entity evolving scenarios, with only an expensive file-based setup showing partial recovery.
ProactBench: Beyond What The User Asked For cs.LG · 2026-05-09 · unverdicted · none · ref 114 · internal anchor
ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.
VORT: Adaptive Power-Law Memory for NLP Transformers cs.LG · 2026-05-09 · unverdicted · none · ref 20 · internal anchor
VORT assigns learnable fractional orders to tokens and approximates their power-law retention kernels via sum-of-exponentials for efficient long-range dependency modeling in transformers.
HUGO-CS: A Hybrid-Labeled, Uncertainty-Aware, General-Purpose, Observational Dataset for Cold Spray cs.LG · 2026-05-05 · accept · none · ref 40 · internal anchor
HUGO-CS is a 4,383-experiment cold-spray dataset extracted from literature via a new hybrid LLM-manual framework that is 30 times larger than prior collections and released with code.
PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training cs.LG · 2026-04-23 · unverdicted · none · ref 135 · internal anchor
Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 21 · 2 links · internal anchor
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training cs.LG · 2026-02-19 · unverdicted · none · ref 19 · internal anchor
CapTrack shows post-training causes drift beyond facts, with instruction fine-tuning producing stronger behavioral changes than preference optimization across model families.
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training cs.LG · 2025-11-10 · unverdicted · none · ref 6 · internal anchor
Q-RAG trains embedders via RL for multi-step retrieval and reports state-of-the-art results on BabiLong and RULER benchmarks for contexts up to 10M tokens.
FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration cs.LG · 2025-02-03 · unverdicted · none · ref 16 · internal anchor
FastKV decouples prefill context reduction via Token-Selective Propagation from independent KV cache selection, delivering up to 1.82x prefill and 2.87x decoding speedups while matching decoding-only accuracy.
MomentKV: Closing the Directional Gap in KV Cache Eviction for Long-Context Inference cs.LG · 2026-06-01 · unverdicted · none · ref 9 · internal anchor
MomentKV maintains count, key mean, value mean, and value-key covariance over evicted tokens to guide selective eviction and provide a first-order approximation of their attention contribution, outperforming baselines on LongBench and RULER.
ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention cs.LG · 2026-05-21 · unverdicted · none · ref 7 · internal anchor
ThriftAttention recovers 89.1% of the FP16 quality gap versus pure FP4 attention by running only 5% of query-key blocks in FP16 on long-context benchmarks.
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization cs.LG · 2026-05-18 · unverdicted · none · ref 34 · internal anchor
OSCAR achieves near-BF16 accuracy for 2-bit KV cache quantization by using offline spectral covariance-aware rotations aligned with attention, plus a custom deployable INT2 kernel compatible with paged serving.
Remember to Forget: Gated Adaptive Positional Encoding cs.LG · 2026-05-11 · unverdicted · none · ref 12 · internal anchor
GAPE augments RoPE with query- and key-dependent gates to stabilize attention and improve long-context performance in language models.
ProxyKV: Cross-Model Proxy Pruning for Efficient Long-Context LLM Inference cs.LG · 2026-05-09 · unverdicted · none · ref 5 · internal anchor
ProxyKV offloads KV cache importance scoring to a lightweight intra-family small-model proxy with HybridAxialMapper and ranking-focused loss, matching KVZip accuracy while achieving up to 3.21x prefilling speedup on models up to 32B.
RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache cs.LG · 2026-05-08 · unverdicted · none · ref 49 · internal anchor
RDKV derives per-token and per-channel weights from attention distortion, then uses reverse water-filling to assign bit-widths from full precision to zero after prefilling, recovering 97.81% accuracy with 2.48% cache retention on LongBench.
The Position Curse: LLMs Struggle to Locate the Last Few Items in a List cs.LG · 2026-05-08 · unverdicted · none · ref 9 · internal anchor
LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.
Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache cs.LG · 2026-05-07 · unverdicted · none · ref 24 · internal anchor
Louver is a new index for LLM KV caches that guarantees zero false negatives for keys above a relevance threshold, runs faster than prior sparse and some dense attention methods, and integrates lightly into existing pipelines.
When Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache Compression cs.LG · 2026-05-07 · unverdicted · none · ref 24 · internal anchor
A fixed-contract probe shows value-aware KV eviction recovers needed evidence in 72.6% of accuracy-improving cases on LongBench but only 32.4% otherwise, suggesting an order of recover evidence, rank value, then preserve couplings.
Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs cs.LG · 2026-04-20 · unverdicted · none · ref 20 · internal anchor
NPUMoE accelerates MoE LLM inference on Apple Silicon NPUs via offline-calibrated static expert tiers, grouped execution, and load-aware graph residency, delivering 1.32x-5.55x lower latency and 1.81x-7.37x better energy efficiency.
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation cs.LG · 2026-04-17 · unverdicted · none · ref 20 · internal anchor
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.
IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs cs.LG · 2026-04-12 · unverdicted · none · ref 5 · internal anchor
IceCache combines semantic token clustering with PagedAttention to keep only 25% of the KV cache tokens while retaining 99% accuracy on LongBench and matching or beating prior offloading methods in latency.
In-Place Test-Time Training cs.LG · 2026-04-07 · conditional · none · ref 29 · internal anchor
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling cs.LG · 2026-03-15 · unverdicted · none · ref 15 · internal anchor
M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
S2O: Early Stopping for Sparse Attention via Online Permutation cs.LG · 2026-02-26 · unverdicted · none · ref 9 · internal anchor
S2O uses online permutation and importance-based early stopping to increase effective sparsity in attention, delivering 7.51x attention and 3.81x end-to-end speedups on Llama-3.1-8B at 128K context with preserved accuracy.
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression cs.LG · 2025-11-26 · unverdicted · none · ref 25 · internal anchor
Gated KalmaNet uses exact Kalman gain computation with adaptive gating and Chebyshev iteration to improve SSM performance on long-context tasks over prior approximations like DeltaNet.
CacheClip: Accelerating RAG with Effective KV Cache Reuse cs.LG · 2025-10-11 · unverdicted · none · ref 42 · internal anchor
CacheClip accelerates RAG prefill by up to 3.33x via auxiliary-model-guided selective KV recomputation while retaining 85-91% of full-attention quality on NIAH and LongBench.
Short window attention enables long-term memorization cs.LG · 2025-09-29 · unverdicted · none · ref 18 · internal anchor
Short sliding windows in hybrid attention-xLSTM models boost long-context performance by encouraging long-term memory use, and stochastic window sizing improves both short and long tasks.
RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference cs.LG · 2025-05-05 · conditional · none · ref 39 · internal anchor
RetroInfer introduces the wave index and wave buffer to realize sparse KV-cache attention for long-context LLM inference with up to 4.4X throughput gains while matching full-attention accuracy.
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval cs.LG · 2024-09-16 · conditional · none · ref 89 · internal anchor
RetrievalAttention approximates full attention in long-context LLMs by retrieving relevant KV vectors from CPU-based ANNS indexes with an attention-aware algorithm, achieving near-full accuracy while accessing only 1-3% of the data.
An Empirical Study of Mamba-based Language Models cs.LG · 2024-06-12 · accept · none · ref 21 · internal anchor
An 8B Mamba-2-Hybrid with 43% Mamba-2, 7% attention, and 50% MLP layers exceeds an 8B Transformer by 2.65 points on average across 12 tasks and matches it on 23 long-context tasks while enabling up to 8x faster inference.
RaBitQCache: Rotated Binary Quantization for KVCache in Long Context LLM Inference cs.LG · 2026-06-30 · unverdicted · none · ref 44 · internal anchor
RaBitQCache proposes rotated binary quantization with binary-INT4 arithmetic for unbiased attention weight estimation in long-context LLMs, enabling adaptive Top-p retrieval and hardware optimizations.
Predict, Reuse, and Repair: Accelerating Dynamic Sparse Attention for Long-Context LLM Decoding cs.LG · 2026-06-29 · conditional · none · ref 8 · internal anchor
PRR accelerates dynamic sparse attention decoding in long-context LLMs via EMA-based prediction, speculative attention, and FlashAttention repair, achieving up to 40% latency reduction.
HARD-KV: Head-Adaptive Regularization for Decoding-time KV Compression cs.LG · 2026-06-27 · unverdicted · none · ref 13 · internal anchor
HARD-KV bridges dynamic head-adaptive KV cache compression with static inference engine constraints via Cascade Cache and Logits Calibration, reporting up to 2x throughput gains on long-context math benchmarks.
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention cs.LG · 2026-05-07 · unverdicted · none · ref 77 · internal anchor
MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.
StreamIndex: Memory-Bounded Compressed Sparse Attention via Streaming Top-k cs.LG · 2026-05-04 · accept · none · ref 12 · internal anchor
Chunked streaming top-k enables CSA indexer execution at 1M sequence length with 6.21 GB peak memory and >=0.998 recall on synthetic V4-shaped inputs.
Caracal: Causal Architecture via Spectral Mixing cs.LG · 2026-04-30 · unverdicted · none · ref 109 · internal anchor
Caracal is a Fourier-based sequence mixing architecture that achieves causal autoregressive modeling with standard operators and competitive performance on long sequences.
SpikingBrain2.0: Brain-Inspired Foundation Models for Efficient Long-Context and Cross-Platform Inference cs.LG · 2026-04-24 · unverdicted · none · ref 16 · internal anchor
SpikingBrain2.0 is a 5B hybrid spiking-Transformer that recovers most base model performance while delivering 10x TTFT speedup at 4M context and supporting over 10M tokens on limited GPUs via dual sparse attention and dual quantization paths.
FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control cs.LG · 2026-04-21 · unverdicted · none · ref 25 · internal anchor
FG²-GDN replaces the scalar beta in the delta update with a channel-wise vector and decouples key/value scaling to improve recall over prior GDN and KDA models.
LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning cs.LG · 2026-04-16 · unverdicted · none · ref 13 · internal anchor
LongAct uses saliency from high-magnitude activations to guide sparse weight updates in long-context RL, yielding about 8% gains on LongBench v2 across multiple algorithms.
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference cs.LG · 2026-04-08 · unverdicted · none · ref 16 · internal anchor
Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project cs.LG · 2026-03-22 · unverdicted · none · ref 78 · internal anchor
The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.
Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism cs.LG · 2025-10-30 · unverdicted · none · ref 13 · internal anchor
Nirvana adds a task-aware memory trigger and updater to specialized generalist models, achieving strong general benchmark results, lowest perplexity in biomedicine/finance/law, and improved MRI reconstruction fidelity.
Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction cs.LG · 2026-05-18 · unverdicted · none · ref 35 · internal anchor
Structural protection of boundary tokens in globally capped KV cache eviction recovers 69-90% of full-cache quality at 13% retention and dominates differences among scoring policies.
Dynamic Nested Hierarchies: Pioneering Self-Evolution in Machine Learning Architectures for Lifelong Intelligence cs.LG · 2025-11-18 · unverdicted · none · ref 35 · internal anchor
Dynamic nested hierarchies let models self-adjust their multi-level optimization structures to support lifelong learning and adaptation to shifting data distributions.
RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference cs.LG · 2026-02-20 · unreviewed · ref 17 · 2 links · internal anchor
HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench cs.LG · 2026-01-28 · unreviewed · ref 9 · internal anchor

RULER: What's the Real Context Size of Your Long-Context Language Models?

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer