LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
hub
The LAMBADA dataset: Word prediction requiring a broad discourse context
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.
A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
Attractor Models solve for fixed points in transformer embeddings using implicit differentiation to enable stable iterative refinement, delivering better perplexity, accuracy, and efficiency than standard or looped transformers.
AdaSplash-2 introduces a histogram-based initialization for the α-entmax normalizer that cuts iterations to 1-2 and, with a sparsity-aware GPU kernel, matches or beats FlashAttention-2 training speed at moderate-to-high sparsity while delivering long-context gains.
EdgeFlow reduces mobile LLM cold-start latency up to 4.07x versus llama.cpp, MNN, and llm.npu by NPU-aware adaptive quantization, SIMD-friendly packing, and synergistic granular CPU-NPU pipelining at comparable accuracy.
Output-aware EM initialization for codebooks in additive quantization avoids poor optimization basins and yields better 2-bit compressed LLMs across Llama and Qwen models.
StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.
DyT improves validation loss 27% at 64M params/1M tokens but worsens it 19% at 118M tokens, with saturation levels predicting the sign of the effect.
ReSpinQuant achieves state-of-the-art accuracy in W4A4 and W3A3 LLM quantization by using efficient residual subspace rotation approximations that match layer-wise performance while retaining the inference speed of global rotation methods.
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
citing papers explorer
-
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models
LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
-
Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation
Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.
-
How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models
A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.
-
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
-
GAIA: a benchmark for General AI Assistants
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
-
Solve the Loop: Attractor Models for Language and Reasoning
Attractor Models solve for fixed points in transformer embeddings using implicit differentiation to enable stable iterative refinement, delivering better perplexity, accuracy, and efficiency than standard or looped transformers.
-
AdaSplash-2: Faster Differentiable Sparse Attention
AdaSplash-2 introduces a histogram-based initialization for the α-entmax normalizer that cuts iterations to 1-2 and, with a sparsity-aware GPU kernel, matches or beats FlashAttention-2 training speed at moderate-to-high sparsity while delivering long-context gains.
-
EdgeFlow: Fast Cold Starts for LLMs on Mobile Devices
EdgeFlow reduces mobile LLM cold-start latency up to 4.07x versus llama.cpp, MNN, and llm.npu by NPU-aware adaptive quantization, SIMD-friendly packing, and synergistic granular CPU-NPU pipelining at comparable accuracy.
-
Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization
Output-aware EM initialization for codebooks in additive quantization avoids poor optimization basins and yields better 2-bit compressed LLMs across Llama and Qwen models.
-
Efficient Streaming Language Models with Attention Sinks
StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.
-
When Does Removing LayerNorm Help? Activation Bounding as a Regime-Dependent Implicit Regularizer
DyT improves validation loss 27% at 64M params/1M tokens but worsens it 19% at 118M tokens, with saturation levels predicting the sign of the effect.
-
ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation
ReSpinQuant achieves state-of-the-art accuracy in W4A4 and W3A3 LLM quantization by using efficient residual subspace rotation approximations that match layer-wise performance while retaining the inference speed of global rotation methods.
-
PaLM 2 Technical Report
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
-
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.