Vim-f: Visual state space model benefiting from learning in the frequency domain

Juntao Zhang, Shaogeng Liu, Jun Zhou, Kun Bian, You Zhou, Jianning Liu, Pei Zhang, Bingyan Liu · 2024 · arXiv 2405.18679

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.

HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.

Caracal: Causal Architecture via Spectral Mixing

cs.LG · 2026-04-30 · unverdicted · novelty 5.0

Caracal is a Fourier-based sequence mixing architecture that achieves causal autoregressive modeling with standard operators and competitive performance on long sequences.

citing papers explorer

Showing 3 of 3 citing papers.

TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles cs.CV · 2026-05-12 · unverdicted · none · ref 72
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet cs.CV · 2026-04-16 · unverdicted · none · ref 76
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
Caracal: Causal Architecture via Spectral Mixing cs.LG · 2026-04-30 · unverdicted · none · ref 94
Caracal is a Fourier-based sequence mixing architecture that achieves causal autoregressive modeling with standard operators and competitive performance on long sequences.

Vim-f: Visual state space model benefiting from learning in the frequency domain

fields

years

verdicts

representative citing papers

citing papers explorer