Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.
hub Mixed citations
mixup: Beyond Empirical Risk Minimization
Mixed citation behavior. Most common role is background (47%).
abstract
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neur
co-cited works
representative citing papers
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
An efficiently computable HS-Jacobian acts as a conservative mapping for projections onto polyhedral sets, supporting provably convergent Adam-based end-to-end training of linearly constrained deep neural networks.
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
LookWhen factorizes video recognition into learning when, where, and what to compute via uniqueness-based token selection and dual-teacher distillation, achieving better accuracy-FLOPs trade-offs than baselines on multiple datasets.
PARSE improves domain generalization accuracy by factoring recognition into visual primitives and their spatial relational compositions learned end-to-end with differentiable predicates.
LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.
SignMAE uses segmentation-driven masking in a mask-and-reconstruct self-supervised task to learn fine-grained sign representations, achieving state-of-the-art accuracy on WLASL, NMFs-CSL, and Slovo with fewer frames and modalities.
A replay method for continual face forgery detection condenses real-fake distribution discrepancies into compact maps and synthesizes compatible samples from current real faces to reduce forgetting under tight memory budgets without storing historical images.
Machine unlearning conflates reversing the influence of specific training examples (untraining) with removing the full underlying distribution or behavior (unlearning).
DREAM introduces Masking Warmup and Semantically Aligned Decoding to let a single encoder handle both contrastive alignment and masked generation, yielding gains over CLIP and FLUID on understanding and generation benchmarks.
ST-BCP tightens the coverage bound in Backward Conformal Prediction by applying a computable data-dependent transformation to nonconformity scores, reducing the average gap from 4.20% to 1.12% on benchmarks while proving superiority over the identity baseline.
Chronos pretrains transformer models on tokenized time series to deliver strong zero-shot forecasting across diverse domains.
The DFDC dataset is the largest public collection of face-swapped videos and supports detectors that generalize to in-the-wild deepfakes.
Lightweight CNN with separable convolutions, hierarchical augmentation and power-based label smoothing reaches 84% cross-environment beam prediction accuracy on two real DeepSense 6G scenarios while cutting parameters by 52x and complexity by 79x versus ResNet.
MedDiffuseMix uses classifier saliency maps to restrict diffusion-based mixing to non-diagnostic areas of medical images, yielding accuracy, F1, and AUC gains over standard, Mixup, and diffusion baselines on four public datasets.
Unsupervised symmetry discovery via shallow group-convolutional networks recovers latent domains from linear measurements of random fields by learning symmetry actions under stationarity and locality constraints.
MindAlign decodes inner speech from fMRI via subject-specific neural-semantic alignment into a multimodal space followed by prompting of a frozen LM, outperforming baselines and generalizing across subjects.
Training-time augmentations in token noise, permutation, and offset categories reduce overfitting and improve minimum validation loss in multi-epoch autoregressive pretraining on fixed corpora.
SNR-ST-Mix is a geometry- and expression-aware mixup augmentation that constrains interpolation to k-nearest spatial neighbors and weights by transcriptomic similarity for spatial transcriptomics imputation.
A causality-inspired FedDG framework with device style intervention network, counterfactual text augmentation, and gradient alignment outperforms baselines on leave-one-device-out validation for RSC on ICBHI and SPRSound datasets.
Representation-conditioned diffusion models generate synthetic ImageNet data that trains classifiers to higher top-1 accuracy than class-conditioned generation (+10.76 pp) or real data (+2.0 pp when scaled).
GAMR introduces geometric-aware manifold regularization via virtual outlier synthesis to enhance intra-class compactness and inter-class separation, improving robustness to noisy labels beyond passive sample filtering.
HamBR uses Spherical HMC to probe ambiguous regions and synthesize virtual outliers with energy-based repulsion to restore decision boundaries degraded by noisy labels, achieving SOTA on CIFAR and real-world benchmarks.
citing papers explorer
-
Personalized Generative Models for Contextual Debiasing
DecoupleGen personalizes diffusion models to create images with uncommon contexts for debiasing object recognition, yielding consistent gains on scene classification tasks.
-
Noise-Robust Financial Numerical Entity Attribute Tagging
NORA applies task-aware weighting and NPK filtering to handle label noise in multi-attribute tagging of financial numerical entities, outperforming baselines on a new 6.6M-instance benchmark.
-
FDDet: Achieving Data-Efficient Food Defect Detection Under Real-World Scenarios
FDDet is a semi-supervised object detection framework with BBoxMixUp and CGPC that outperforms standard detectors on the new FDD-48 food defect dataset under data-limited real-world conditions.
-
Holistic Reliability Propagation: Decoupling Annotation and Prediction for Robust Noisy-Label
HRP decouples annotation reliability (alpha) and pseudo-label reliability (beta) via bilevel meta-learning and routes them to distinct objectives in reliability-aware Mixup and contrastive learning for improved noisy-label robustness.
-
Axiomatizing Neural Networks via Pursuit of Subspaces
Authors introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic geometric framework that unifies explanations for representation, computation, and generalization in shallow and deep neural networks.
-
Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification
Transductive Sharpening adds an entropy-minimization term on unlabeled-node predictions to the training objective for graph node classification.
-
CAST: Channel-Aware Spatial Transfer Learning with Pseudo-Image Radar for Sign Language Recognition
CAST achieves 80.5% Top-1 accuracy on radar-only sign language recognition by fusing physics-aware CVD and RTM representations through channel-aware spatial attention and asymmetric cross-attention.
-
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models
Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.
-
HiMix: Hierarchical Artifact-aware Mixup for Generalized Synthetic Image Detection
HiMix combines mixup augmentation to create transitional real-fake samples with hierarchical global-local artifact feature fusion to achieve better generalization in detecting AI-generated images from unseen generators.
-
Investigating Bias and Fairness in Appearance-based Gaze Estimation
First large-scale fairness audit of gaze estimators reveals sizable accuracy disparities by ethnicity and gender, with existing mitigation methods providing only marginal fairness gains.
-
Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities
Introduces MAF framework and DeepModal-Bench to capture universal cross-modal forgery traces for better generalization in multimodal deepfake detection.
-
Multi-Aspect Knowledge Distillation for Language Model with Low-rank Factorization
MaKD distills pre-trained language models by deeply mimicking self-attention and feed-forward modules across aspects using low-rank factorization, matching strong baselines at the same parameter budget and extending to auto-regressive models.
-
Why Invariance is Not Enough for Biomedical Domain Generalization and How to Fix It
MaskGen improves domain generalization for biomedical image segmentation by using source intensities plus domain-stable foundation model representations with minimal added complexity.
-
Ordinal Adaptive Correction: A Data-Centric Approach to Ordinal Image Classification with Noisy Labels
ORDAC adaptively corrects noisy ordinal labels via dynamic label distribution adjustments, yielding lower error and higher recall on noisy Adience and Diabetic Retinopathy benchmarks.
-
Two-Stage Framework for Efficient UAV-Based Wildfire Video Analysis with Adaptive Compression and Fire Source Detection
A two-stage UAV framework prunes redundant wildfire video clips via a policy network with station point mechanism and detects fire sources in real time using an improved YOLOv8 model.
-
i-WiViG: Interpretable Window Vision GNN
i-WiViG is an interpretable window vision GNN that constrains nodes to disjoint local windows and applies learnable sparse attention to identify relevant subgraphs, delivering competitive performance on scene classification and regression with natural and remote-sensing images.
-
Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond
The ADC method automates the creation of large image classification datasets using LLMs and search engines, achieving 79% human agreement and reducing label noise on a 1 million image clothing dataset, while also releasing benchmarks for noise and bias issues.
-
YOLOv4: Optimal Speed and Accuracy of Object Detection
YOLOv4 achieves 43.5% AP (65.7% AP50) on MS COCO at ~65 FPS on Tesla V100 by integrating WRC, CSP, CmBN, SAT, Mish activation, Mosaic augmentation, DropBlock, and CIoU loss.
-
Annotation-Free Cardiac Vessel Segmentation via Knowledge Transfer from Retinal Images
SC-GAN performs annotation-free coronary artery segmentation by transferring shape-consistent knowledge from retinal vessel annotations via a GAN trained on 1092 DSA images.
-
The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification
Tuning receptive field sizes in ResNet and DenseNet enables them to outperform VGG models on acoustic scene classification across three datasets.
-
Efficient data augmentation using graph imputation neural networks
Graph imputation neural networks augment semi-supervised datasets up to 10x by reconstructing heavily damaged samples on a similarity graph, improving over fully-supervised baselines on benchmarks.
-
PRISM: Prioritized Channel Importance with Semi-supervised Domain Adaptation for Cross-Subject EEG Emotion Recognition
PRISM combines data-dependent channel weighting via expert ensemble and confidence-filtered pseudo-label domain adaptation to outperform prior methods on cross-subject EEG emotion tasks in DEAP, DREAMER, and SEED.
-
Improving Combined Detection and Classification of TEM Defects via Mask-Conditioned Latent Diffusion Augmentation
Mask-conditioned LDM generates synthetic TEM defect image-mask pairs that augment small experimental sets and produce up to 0.02 gain in harmonic-mean F1 for combined detection and classification with Mask R-CNN.
-
an interpretable vision transformer framework for automated brain tumor classification
Vision Transformer with CLAHE preprocessing, two-stage fine-tuning, MixUp/CutMix, EMA, TTA, and attention rollout achieves 99.29% accuracy and 99.25% macro F1 on four-class brain tumor MRI classification from 7023 scans.
-
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
-
PR3DICTR: A modular AI framework for medical 3D image-based detection and outcome prediction
PR3DICTR is a new open-access modular framework for 3D medical image classification and outcome prediction that works with as little as two lines of code.
-
CLIP the Landscape: Automated Tagging of Crowdsourced Landscape Images
A lightweight multi-modal CLIP pipeline predicts exact-match geographical tags on a Kaggle subset of the Geograph crowdsourced image archive by fusing image, location, and title embeddings.
-
Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification
A CRNN model with frame-level attention achieves state-of-the-art accuracy on ESC-10 and ESC-50 environmental sound classification datasets.
-
Rethinking Text-to-Image as Semantic-Aware Data Augmentation for Indoor Scene Recognition
Stable Diffusion augments limited indoor scene datasets for better recognition models, and DIRE detects the generated images with 100% accuracy using lightweight classifiers.
-
CellNet -- Localizing Cells using Sparse and Noisy Point Annotations
CellNet applies regression-based deep learning to count cells from sparse point annotations in microscopy images and claims better performance than zero-shot methods in low-data regimes.
-
Optimizing 2D Input Representations and Sub-phase Fusion Strategies for Differential Diagnosis of Asthma and COPD Using CNN- and GRU-Based Networks
MFCC matrices with 13 coefficients and adaptive windowing plus direct concatenation outperform log-mel spectrograms and VAR models for asthma-COPD classification, reaching F1 scores of 0.877 (cycle) and 0.855 (subject).
-
The General Theory of Localization Methods
The localization method is presented as a unifying framework connecting kernel methods, MeanShift, Hopfield networks, LLE, fuzzy inference, denoising autoencoders, and Transformers via local models and the localization trick.
-
SleepNet and DreamNet: Enriching and Reconstructing Representations for Consolidated Visual Classification
SleepNet and DreamNet enrich visual features via supervised pre-trained encoders and reconstruct hidden states with encoder-decoder frameworks to outperform prior state-of-the-art classifiers.
-
HODGEPODGE: Sound event detection based on ensemble of semi-supervised learning methods
An ensemble of CRNNs trained with consistency regularization and MixUp on mixed labeled/unlabeled data reaches 42.0% event-based F-measure on DCASE 2019 Task 4, beating the 25.8% baseline.
-
Image-Based Malware Type Classification on MalNet-Image Tiny: Effects of Multi-Scale Fusion, Transfer Learning, Data Augmentation, and Schedule-Free Optimization
Pretraining plus Mixup/TrivialAugment and a feature pyramid network lift macro-F1 from 0.65 to 0.69 on 43-class malware image classification while cutting training epochs from 96 to 10.
- Know Yourself Better: Diverse Object-Related Features Improve Open Set Recognition