Falcon mamba: The first competitive attention-free 7b language model

Falcon mamba: The first competitive attention-free 7b language model , author= · 2024 · arXiv 2410.05355

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Timesteps of Mamba Align with Human Reading Times

cs.CL · 2026-06-29 · unverdicted · novelty 6.0

Mamba's per-word timesteps significantly predict human reading times beyond GPT-2 surprisal in a naturalistic dataset.

LLM Self-Recognition: Steering and Retrieving Activation Signatures

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.

Towards Large Model Feature Coding

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

LaMoFCBench is a new benchmark covering 4 categories and 16 scenarios that exposes misalignment between mainstream feature codecs and the heterogeneous statistics of large-model activations.

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.

Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models

cs.AR · 2026-04-04 · unverdicted · novelty 6.0

Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.

SpikingBrain: Spiking Brain-inspired Large Models

cs.LG · 2025-09-05 · unverdicted · novelty 6.0

SpikingBrain-7B and SpikingBrain-76B achieve Transformer-comparable performance after continual pre-training on 150B tokens, with over 100x TTFT speedup on 4M-token sequences and 69.15% sparsity from event-driven spiking.

GRAIN: Group Aggregation via Min-Norm Objective

cs.LG · 2026-06-22 · unverdicted · novelty 5.0

GRAIN is a gradient aggregation method using min-norm objectives to ensure non-negative inner products with group gradients, yielding tighter uniform stability bounds than SGD under smoothness assumptions.

Rethinking Uncertainty Estimation in LLMs: A Principled Single-Sequence Measure

cs.LG · 2024-12-19 · unverdicted · novelty 5.0

Negative log-likelihood of the greedy-decoded most likely sequence (G-NLL) is a principled single-sequence uncertainty measure for LLMs that achieves state-of-the-art results.

Token-Operations-Oriented Inference Optimization Techniques for Large Models

cs.SE · 2026-06-18 · unverdicted · novelty 3.0

The paper introduces a four-layer technical architecture for token-operations-oriented inference optimization in large models and reviews key technologies and industry status at each layer.

citing papers explorer

Showing 7 of 7 citing papers after filters.

Timesteps of Mamba Align with Human Reading Times cs.CL · 2026-06-29 · unverdicted · none · ref 71
Mamba's per-word timesteps significantly predict human reading times beyond GPT-2 surprisal in a naturalistic dataset.
LLM Self-Recognition: Steering and Retrieving Activation Signatures cs.AI · 2026-06-04 · unverdicted · none · ref 51
Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.
Towards Large Model Feature Coding cs.CV · 2026-05-20 · unverdicted · none · ref 31
LaMoFCBench is a new benchmark covering 4 categories and 16 scenarios that exposes misalignment between mainstream feature codecs and the heterogeneous statistics of large-model activations.
PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer cs.CV · 2026-04-07 · unverdicted · none · ref 84
PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.
Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models cs.AR · 2026-04-04 · unverdicted · none · ref 14
Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.
GRAIN: Group Aggregation via Min-Norm Objective cs.LG · 2026-06-22 · unverdicted · none · ref 10
GRAIN is a gradient aggregation method using min-norm objectives to ensure non-negative inner products with group gradients, yielding tighter uniform stability bounds than SGD under smoothness assumptions.
Token-Operations-Oriented Inference Optimization Techniques for Large Models cs.SE · 2026-06-18 · unverdicted · none · ref 228
The paper introduces a four-layer technical architecture for token-operations-oriented inference optimization in large models and reviews key technologies and industry status at each layer.

Falcon mamba: The first competitive attention-free 7b language model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer