Falcon Mamba: The first competitive attention-free 7B language model

· 2024 · arXiv 2410.05355

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.

Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models

cs.AR · 2026-04-04 · unverdicted · novelty 6.0

Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.

citing papers explorer

Showing 2 of 2 citing papers.

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer cs.CV · 2026-04-07 · unverdicted · none · ref 84
PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.
Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models cs.AR · 2026-04-04 · unverdicted · none · ref 14
Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.

Falcon Mamba: The first competitive attention-free 7B language model

fields

years

verdicts

representative citing papers

citing papers explorer