hub

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie · 2022

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

browse 15 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

Rotation Equivariant Mamba for Vision Tasks

cs.CV · 2026-03-10 · unverdicted · novelty 8.0

EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.

MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.

Can Graphs Help Vision SSMs See Better?

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.

No Epoch Like the Present: Robust Climate Emulation Requires Out-of-Distribution Generalisation

cs.LG · 2026-05-21 · conditional · novelty 6.0

ML climate emulators degrade under seasonal distribution shifts that proxy long-term climate change, but physically motivated compositional decompositions improve out-of-distribution performance with modest in-distribution trade-offs.

MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification

cs.CV · 2026-05-19 · conditional · novelty 6.0

Contrastive pretraining on mammography atlas image-text pairs improves BI-RADS classification F1 by 1-14% especially in low-label regimes, outperforming equivalent numbers of direct labels in some settings.

PIXLRelight: Controllable Relighting via Intrinsic Conditioning

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

A transformer-based neural renderer that transfers arbitrary PBR lighting to single images via shared intrinsic conditioning extracted from both multi-illumination photos and path-traced coarse 3D renders.

FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.

When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Hierarchy-Aware Cross-Entropy improves image classification by incorporating class hierarchies into the loss through prediction aggregation and ancestral label smoothing, achieving mean accuracy gains of 4.66% in end-to-end training and 2.18% in linear probing.

Linear-Time Global Visual Modeling without Explicit Attention

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.

CXR-LT 2026 Challenge: Multi-Center Long-Tailed and Zero Shot Chest X-ray Classification

cs.CV · 2026-04-16 · accept · novelty 6.0

CXR-LT 2026 introduces a radiologist-annotated multi-center dataset of 145k+ CXRs to benchmark robust multi-label classification on known classes and open-world generalization to unseen rare diseases.

StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.

TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses

cs.CV · 2025-09-26 · unverdicted · novelty 6.0

TRUST is a test-time adaptation method for SSM vision models that uses uncertainty-guided traversal permutations to refine Mamba parameters via pseudo-labels and weight averaging, improving robustness on distribution shifts.

Stimulus symmetries can confound representational similarity analyses

q-bio.NC · 2026-05-20 · unverdicted · novelty 5.0

Stimulus symmetries render many neural representations functionally equivalent yet produce qualitatively different RSMs, including drifting ones from SGD or regularization in image-encoding networks.

ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.

citing papers explorer

Showing 15 of 15 citing papers.

Rotation Equivariant Mamba for Vision Tasks cs.CV · 2026-03-10 · unverdicted · none · ref 6
EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space cs.CV · 2026-05-19 · unverdicted · none · ref 42
MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.
Can Graphs Help Vision SSMs See Better? cs.CV · 2026-05-11 · unverdicted · none · ref 37
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
No Epoch Like the Present: Robust Climate Emulation Requires Out-of-Distribution Generalisation cs.LG · 2026-05-21 · conditional · none · ref 75
ML climate emulators degrade under seasonal distribution shifts that proxy long-term climate change, but physically motivated compositional decompositions improve out-of-distribution performance with modest in-distribution trade-offs.
MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification cs.CV · 2026-05-19 · conditional · none · ref 17
Contrastive pretraining on mammography atlas image-text pairs improves BI-RADS classification F1 by 1-14% especially in low-label regimes, outperforming equivalent numbers of direct labels in some settings.
PIXLRelight: Controllable Relighting via Intrinsic Conditioning cs.CV · 2026-05-18 · unverdicted · none · ref 31
A transformer-based neural renderer that transfers arbitrary PBR lighting to single images via shared intrinsic conditioning extracted from both multi-illumination photos and path-traced coarse 3D renders.
FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry cs.LG · 2026-05-11 · unverdicted · none · ref 27
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy cs.LG · 2026-05-07 · unverdicted · none · ref 9
Hierarchy-Aware Cross-Entropy improves image classification by incorporating class hierarchies into the loss through prediction aggregation and ancestral label smoothing, achieving mean accuracy gains of 4.66% in end-to-end training and 2.18% in linear probing.
Linear-Time Global Visual Modeling without Explicit Attention cs.CV · 2026-05-03 · unverdicted · none · ref 24
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
CXR-LT 2026 Challenge: Multi-Center Long-Tailed and Zero Shot Chest X-ray Classification cs.CV · 2026-04-16 · accept · none · ref 26
CXR-LT 2026 introduces a radiologist-annotated multi-center dataset of 145k+ CXRs to benchmark robust multi-label classification on known classes and open-world generalization to unseen rare diseases.
StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods cs.CV · 2026-04-06 · unverdicted · none · ref 18
StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses cs.CV · 2025-09-26 · unverdicted · none · ref 5
TRUST is a test-time adaptation method for SSM vision models that uses uncertainty-guided traversal permutations to refine Mamba parameters via pseudo-labels and weight averaging, improving robustness on distribution shifts.
Stimulus symmetries can confound representational similarity analyses q-bio.NC · 2026-05-20 · unverdicted · none · ref 25
Stimulus symmetries render many neural representations functionally equivalent yet produce qualitatively different RSMs, including drifting ones from SGD or regularization in image-encoding networks.
ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs cs.CV · 2026-05-08 · unverdicted · none · ref 64
ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.
Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction cs.CV · 2026-05-08 · unverdicted · none · ref 30
A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.

A convnet for the 2020s

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer