TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6verdicts
UNVERDICTED 6representative citing papers
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
A bilevel method learns composite pretraining loss weights online via gradient alignment with a downstream objective, matching tuned baselines at roughly 30% extra cost over one training run.
HDSD decouples parameter subspaces in vision-language models via a Feature Modulation Module, General Fusion Module with adaptive thresholds, and Hierarchical Learning Module with SVD scaling to minimize cross-task interference and achieve state-of-the-art class-incremental learning performance.
CAST selects better multimodal coresets by fusing collapse-aware topologies across modalities and matching distributions at multiple scales in the diffusion wavelet domain.
citing papers explorer
-
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
-
Can Graphs Help Vision SSMs See Better?
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
-
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
-
When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining
A bilevel method learns composite pretraining loss weights online via gradient alignment with a downstream objective, matching tuned baselines at roughly 30% extra cost over one training run.
-
Hierarchical Dual-Subspace Decoupling for Continual Learning in Vision-Language Models
HDSD decouples parameter subspaces in vision-language models via a Feature Modulation Module, General Fusion Module with adaptive thresholds, and Hierarchical Learning Module with SVD scaling to minimize cross-task interference and achieve state-of-the-art class-incremental learning performance.
-
CAST: Collapse-Aware multi-Scale Topology Fusion for Multimodal Coreset Selection
CAST selects better multimodal coresets by fusing collapse-aware topologies across modalities and matching distributions at multiple scales in the diffusion wavelet domain.