Mamba's per-word timesteps significantly predict human reading times beyond GPT-2 surprisal in a naturalistic dataset.
Falcon mamba: The first competitive attention-free 7b language model
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
background 1polarities
background 1representative citing papers
Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.
LaMoFCBench is a new benchmark covering 4 categories and 16 scenarios that exposes misalignment between mainstream feature codecs and the heterogeneous statistics of large-model activations.
PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.
Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.
SpikingBrain-7B and SpikingBrain-76B achieve Transformer-comparable performance after continual pre-training on 150B tokens, with over 100x TTFT speedup on 4M-token sequences and 69.15% sparsity from event-driven spiking.
GRAIN is a gradient aggregation method using min-norm objectives to ensure non-negative inner products with group gradients, yielding tighter uniform stability bounds than SGD under smoothness assumptions.
Negative log-likelihood of the greedy-decoded most likely sequence (G-NLL) is a principled single-sequence uncertainty measure for LLMs that achieves state-of-the-art results.
The paper introduces a four-layer technical architecture for token-operations-oriented inference optimization in large models and reviews key technologies and industry status at each layer.
citing papers explorer
-
Timesteps of Mamba Align with Human Reading Times
Mamba's per-word timesteps significantly predict human reading times beyond GPT-2 surprisal in a naturalistic dataset.
-
LLM Self-Recognition: Steering and Retrieving Activation Signatures
Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.
-
Towards Large Model Feature Coding
LaMoFCBench is a new benchmark covering 4 categories and 16 scenarios that exposes misalignment between mainstream feature codecs and the heterogeneous statistics of large-model activations.
-
PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer
PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.
-
Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models
Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.
-
GRAIN: Group Aggregation via Min-Norm Objective
GRAIN is a gradient aggregation method using min-norm objectives to ensure non-negative inner products with group gradients, yielding tighter uniform stability bounds than SGD under smoothness assumptions.
-
Token-Operations-Oriented Inference Optimization Techniques for Large Models
The paper introduces a four-layer technical architecture for token-operations-oriented inference optimization in large models and reviews key technologies and industry status at each layer.