EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
hub Mixed citations
Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25
Mixed citation behavior. Most common role is background (50%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
SubPopMark embeds verifiable subpopulation biases into distilled datasets via CVM and USTM optimization stages, allowing provenance inference through comparison of model output signatures against a reference behavior bank.
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.
A human-centered OOD spectrum based on perceptual difficulty shows vision-language models align best with human errors across regimes, with CNNs stronger on near-OOD and ViTs on far-OOD.
GRAM is a latent-variable generative model that performs recursive reasoning via stochastic trajectories, trained with amortized variational inference to support multi-hypothesis reasoning and unconditional generation.
Classical momentum acceleration in mini-batch SGD for quadratics is proportional to batch size up to saturation, enabling perfect parallelization under minimal noise assumptions.
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
A bilevel method learns composite pretraining loss weights online via gradient alignment with a downstream objective, matching tuned baselines at roughly 30% extra cost over one training run.
HDSD decouples parameter subspaces in vision-language models via a Feature Modulation Module, General Fusion Module with adaptive thresholds, and Hierarchical Learning Module with SVD scaling to minimize cross-task interference and achieve state-of-the-art class-incremental learning performance.
Hierarchy-Aware Cross-Entropy improves image classification by incorporating class hierarchies into the loss through prediction aggregation and ancestral label smoothing, achieving mean accuracy gains of 4.66% in end-to-end training and 2.18% in linear probing.
CNNs achieve dimension-dependent Sobolev approximation rates on manifolds, and a spectral boundary loss using Laplace-Beltrami eigenmodes enables stable PINN solutions for elliptic problems with improved accuracy over standard approaches.
Rotation-equivariant convolutions and adaptive TL-Conv layers are added to I2I networks to preserve rotation symmetry and improve translation quality across domains.
SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.
StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
CAST selects better multimodal coresets by fusing collapse-aware topologies across modalities and matching distributions at multiple scales in the diffusion wavelet domain.
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
citing papers explorer
-
Rotation Equivariant Mamba for Vision Tasks
EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
-
From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation
SubPopMark embeds verifiable subpopulation biases into distilled datasets via CVM and USTM optimization stages, allowing provenance inference through comparison of model output signatures against a reference behavior bank.
-
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
-
Can Graphs Help Vision SSMs See Better?
GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
-
The Indra Representation Hypothesis for Multimodal Alignment
Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.
-
Do Machines Fail Like Humans? A Human-Centred Out-of-Distribution Spectrum for Mapping Error Alignment
A human-centered OOD spectrum based on perceptual difficulty shows vision-language models align best with human errors across regimes, with CNNs stronger on near-OOD and ViTs on far-OOD.
-
Generative Recursive Reasoning
GRAM is a latent-variable generative model that performs recursive reasoning via stochastic trajectories, trained with amortized variational inference to support multi-hypothesis reasoning and unconditional generation.
-
Perfect Parallelization in Mini-Batch SGD with Classical Momentum Acceleration
Classical momentum acceleration in mini-batch SGD for quadratics is proportional to batch size up to saturation, enabling perfect parallelization under minimal noise assumptions.
-
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
-
When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining
A bilevel method learns composite pretraining loss weights online via gradient alignment with a downstream objective, matching tuned baselines at roughly 30% extra cost over one training run.
-
Hierarchical Dual-Subspace Decoupling for Continual Learning in Vision-Language Models
HDSD decouples parameter subspaces in vision-language models via a Feature Modulation Module, General Fusion Module with adaptive thresholds, and Hierarchical Learning Module with SVD scaling to minimize cross-task interference and achieve state-of-the-art class-incremental learning performance.
-
When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy
Hierarchy-Aware Cross-Entropy improves image classification by incorporating class hierarchies into the loss through prediction aggregation and ancestral label smoothing, achieving mean accuracy gains of 4.66% in end-to-end training and 2.18% in linear probing.
-
Simultaneous CNN Approximation on Manifolds with Applications to Boundary Value Problems
CNNs achieve dimension-dependent Sobolev approximation rates on manifolds, and a spectral boundary loss using Laplace-Beltrami eigenmodes enables stable PINN solutions for elliptic problems with improved accuracy over standard approaches.
-
Image-to-Image Translation Framework Embedded with Rotation Symmetry Priors
Rotation-equivariant convolutions and adaptive TL-Conv layers are added to I2I networks to preserve rotation symmetry and improve translation quality across domains.
-
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning
SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.
-
StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods
StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.
-
CAST: Collapse-Aware multi-Scale Topology Fusion for Multimodal Coreset Selection
CAST selects better multimodal coresets by fusing collapse-aware topologies across modalities and matching distributions at multiple scales in the diffusion wavelet domain.
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
- From Per-Image Low-Rank to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers