R Chris Miall and Daniel M Wolpert

URL https://arxiv · 2024 · arXiv 2404.08819

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model

cs.LG · 2026-04-07 · unverdicted · novelty 7.0

Mamba-2 models fail to learn reversible state retrieval in the UNDO Flip-Flop task, defaulting to a toggle heuristic and achieving only 41% accuracy under adversarial conditions.

UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration

cs.LG · 2026-03-06 · unverdicted · novelty 6.0

UniMamba integrates Mamba state-space dynamics with attention layers and transforms like FFT-Laplace to outperform prior models on multivariate time series forecasting benchmarks.

Kimi Linear: An Expressive, Efficient Attention Architecture

cs.CL · 2025-10-30 · unverdicted · novelty 6.0

Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.

Adaptive Memory Decay for Log-Linear Attention

cs.LG · 2026-05-07 · conditional · novelty 5.0

Making memory decay input-dependent via a lightweight MLP improves log-linear attention performance on associative recall, selective copying, and language modeling, especially for long sequences.

The Serial Scaling Hypothesis

cs.LG · 2025-07-16 · unverdicted · novelty 5.0

The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

Measuring AI Reasoning: A Guide for Researchers

cs.AI · 2026-05-04 · unverdicted · novelty 4.0

Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.

A Survey of Mamba

cs.LG · 2024-08-02 · unverdicted · novelty 2.0

The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.

Next-Latent Prediction Transformers Learn Compact World Models

cs.LG · 2025-11-08

citing papers explorer

Showing 9 of 9 citing papers.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 34
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model cs.LG · 2026-04-07 · unverdicted · none · ref 13
Mamba-2 models fail to learn reversible state retrieval in the UNDO Flip-Flop task, defaulting to a toggle heuristic and achieving only 41% accuracy under adversarial conditions.
UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration cs.LG · 2026-03-06 · unverdicted · none · ref 18
UniMamba integrates Mamba state-space dynamics with attention layers and transforms like FFT-Laplace to outperform prior models on multivariate time series forecasting benchmarks.
Kimi Linear: An Expressive, Efficient Attention Architecture cs.CL · 2025-10-30 · unverdicted · none · ref 65
Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.
Adaptive Memory Decay for Log-Linear Attention cs.LG · 2026-05-07 · conditional · none · ref 24
Making memory decay input-dependent via a lightweight MLP improves log-linear attention performance on associative recall, selective copying, and language modeling, especially for long sequences.
The Serial Scaling Hypothesis cs.LG · 2025-07-16 · unverdicted · none · ref 74
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
Measuring AI Reasoning: A Guide for Researchers cs.AI · 2026-05-04 · unverdicted · none · ref 33
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
A Survey of Mamba cs.LG · 2024-08-02 · unverdicted · none · ref 137
The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.
Next-Latent Prediction Transformers Learn Compact World Models cs.LG · 2025-11-08 · unreviewed · ref 24

R Chris Miall and Daniel M Wolpert

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer