EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
hub
A convnet for the 2020s
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
ML climate emulators degrade under seasonal distribution shifts that proxy long-term climate change, but physically motivated compositional decompositions improve out-of-distribution performance with modest in-distribution trade-offs.
Contrastive pretraining on mammography atlas image-text pairs improves BI-RADS classification F1 by 1-14% especially in low-label regimes, outperforming equivalent numbers of direct labels in some settings.
A transformer-based neural renderer that transfers arbitrary PBR lighting to single images via shared intrinsic conditioning extracted from both multi-illumination photos and path-traced coarse 3D renders.
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
Hierarchy-Aware Cross-Entropy improves image classification by incorporating class hierarchies into the loss through prediction aggregation and ancestral label smoothing, achieving mean accuracy gains of 4.66% in end-to-end training and 2.18% in linear probing.
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
CXR-LT 2026 introduces a radiologist-annotated multi-center dataset of 145k+ CXRs to benchmark robust multi-label classification on known classes and open-world generalization to unseen rare diseases.
StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
TRUST is a test-time adaptation method for SSM vision models that uses uncertainty-guided traversal permutations to refine Mamba parameters via pseudo-labels and weight averaging, improving robustness on distribution shifts.
Stimulus symmetries render many neural representations functionally equivalent yet produce qualitatively different RSMs, including drifting ones from SGD or regularization in image-encoding networks.
ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.
A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.
citing papers explorer
-
Rotation Equivariant Mamba for Vision Tasks
EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
-
MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space
MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.
-
Can Graphs Help Vision SSMs See Better?
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
-
No Epoch Like the Present: Robust Climate Emulation Requires Out-of-Distribution Generalisation
ML climate emulators degrade under seasonal distribution shifts that proxy long-term climate change, but physically motivated compositional decompositions improve out-of-distribution performance with modest in-distribution trade-offs.
-
MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification
Contrastive pretraining on mammography atlas image-text pairs improves BI-RADS classification F1 by 1-14% especially in low-label regimes, outperforming equivalent numbers of direct labels in some settings.
-
PIXLRelight: Controllable Relighting via Intrinsic Conditioning
A transformer-based neural renderer that transfers arbitrary PBR lighting to single images via shared intrinsic conditioning extracted from both multi-illumination photos and path-traced coarse 3D renders.
-
FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
-
When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy
Hierarchy-Aware Cross-Entropy improves image classification by incorporating class hierarchies into the loss through prediction aggregation and ancestral label smoothing, achieving mean accuracy gains of 4.66% in end-to-end training and 2.18% in linear probing.
-
Linear-Time Global Visual Modeling without Explicit Attention
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
-
CXR-LT 2026 Challenge: Multi-Center Long-Tailed and Zero Shot Chest X-ray Classification
CXR-LT 2026 introduces a radiologist-annotated multi-center dataset of 145k+ CXRs to benchmark robust multi-label classification on known classes and open-world generalization to unseen rare diseases.
-
StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods
StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
-
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
TRUST is a test-time adaptation method for SSM vision models that uses uncertainty-guided traversal permutations to refine Mamba parameters via pseudo-labels and weight averaging, improving robustness on distribution shifts.
-
Stimulus symmetries can confound representational similarity analyses
Stimulus symmetries render many neural representations functionally equivalent yet produce qualitatively different RSMs, including drifting ones from SGD or regularization in image-encoding networks.
-
ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs
ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.
-
Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction
A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.