Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.
hub Mixed citations
mixup: Beyond Empirical Risk Minimization
Mixed citation behavior. Most common role is background (47%).
abstract
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neur
co-cited works
representative citing papers
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
An efficiently computable HS-Jacobian acts as a conservative mapping for projections onto polyhedral sets, supporting provably convergent Adam-based end-to-end training of linearly constrained deep neural networks.
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
LookWhen factorizes video recognition into learning when, where, and what to compute via uniqueness-based token selection and dual-teacher distillation, achieving better accuracy-FLOPs trade-offs than baselines on multiple datasets.
PARSE improves domain generalization accuracy by factoring recognition into visual primitives and their spatial relational compositions learned end-to-end with differentiable predicates.
LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.
SignMAE uses segmentation-driven masking in a mask-and-reconstruct self-supervised task to learn fine-grained sign representations, achieving state-of-the-art accuracy on WLASL, NMFs-CSL, and Slovo with fewer frames and modalities.
A replay method for continual face forgery detection condenses real-fake distribution discrepancies into compact maps and synthesizes compatible samples from current real faces to reduce forgetting under tight memory budgets without storing historical images.
Machine unlearning conflates reversing the influence of specific training examples (untraining) with removing the full underlying distribution or behavior (unlearning).
DREAM introduces Masking Warmup and Semantically Aligned Decoding to let a single encoder handle both contrastive alignment and masked generation, yielding gains over CLIP and FLUID on understanding and generation benchmarks.
ST-BCP tightens the coverage bound in Backward Conformal Prediction by applying a computable data-dependent transformation to nonconformity scores, reducing the average gap from 4.20% to 1.12% on benchmarks while proving superiority over the identity baseline.
Chronos pretrains transformer models on tokenized time series to deliver strong zero-shot forecasting across diverse domains.
The DFDC dataset is the largest public collection of face-swapped videos and supports detectors that generalize to in-the-wild deepfakes.
GAMR introduces geometric-aware manifold regularization via virtual outlier synthesis to enhance intra-class compactness and inter-class separation, improving robustness to noisy labels beyond passive sample filtering.
HamBR uses Spherical HMC to probe ambiguous regions and synthesize virtual outliers with energy-based repulsion to restore decision boundaries degraded by noisy labels, achieving SOTA on CIFAR and real-world benchmarks.
LiBaGS scores and selects synthetic data near decision boundaries using proximity, uncertainty, density, and validity, with boundary-gap allocation and marginal stopping to improve training accuracy.
CORF unifies domain generalization and class-incremental learning via selective sample refinement with spatial maps and confidence weighting plus cascaded relational distillation.
CircleID introduces a controlled dataset of 46,155 circles from 66 writers and 8 pens, with competition results showing top accuracies of 64.8% for open-set writer identification and 92.7% for pen classification.
Mixing real UAV imagery with 2101 AI-generated image-mask pairs improves semantic segmentation F1 scores for fine-grained forest species by over 15 percentage points overall and up to 30 points for rare classes.
CHCL aligns a Cheeger-Hodge joint signature across graph augmentations to produce embeddings that remain stable under local structural changes.
TranCLR models continuous skeleton action spaces with transitional anchors and multi-level manifold calibration, yielding smoother and more accurate representations than binary contrastive methods.
PAC-Bayes bounds for Gibbs posteriors are obtained via singular learning theory, producing explicit and tighter posterior-averaged risk bounds that adapt to data structure in overparameterized models.
HG-DTGL integrates human gaze as an extra teacher in mean-teacher learning via GazeMix, MGP module and Gaze Loss, reporting superior segmentation across ten organs on multiple modalities.
citing papers explorer
-
Is your algorithm unlearning or untraining?
Machine unlearning conflates reversing the influence of specific training examples (untraining) with removing the full underlying distribution or behavior (unlearning).
-
Chronos: Learning the Language of Time Series
Chronos pretrains transformer models on tokenized time series to deliver strong zero-shot forecasting across diverse domains.
-
LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection
LiBaGS scores and selects synthetic data near decision boundaries using proximity, uncertainty, density, and validity, with boundary-gap allocation and marginal stopping to improve training accuracy.
-
Cheeger--Hodge Contrastive Learning for Structurally Robust Graph Representation Learning
CHCL aligns a Cheeger-Hodge joint signature across graph augmentations to produce embeddings that remain stable under local structural changes.
-
Feature-Aware Anisotropic Local Differential Privacy for Utility-Preserving Graph Representation Learning in Metal Additive Manufacturing
FI-LDP-HGAT applies feature-importance-aware anisotropic local differential privacy to a hierarchical graph attention network, recovering 81.5% utility at epsilon=4 and 0.762 defect recall at epsilon=2 on a DED porosity dataset while outperforming standard LDP and DP-SGD baselines.
-
Can LLMs Learn to Reason Robustly under Noisy Supervision?
Online Label Refinement lets LLMs learn robust reasoning from noisy supervision by correcting labels when majority answers show rising rollout success and stable history, delivering 3-4% gains on math and reasoning benchmarks even at high noise levels.
-
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios
LoFT uses parameter-efficient fine-tuning of foundation models for long-tailed semi-supervised learning, supported by proofs that this reduces hypothesis complexity to minimize balanced posterior error and compresses outlier acceptance regions, with LoFT-OW handling open-world OOD cases.
-
Sharpness-Aware Minimization for Efficiently Improving Generalization
SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.
-
Axiomatizing Neural Networks via Pursuit of Subspaces
Authors introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic geometric framework that unifies explanations for representation, computation, and generalization in shallow and deep neural networks.
-
Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification
Transductive Sharpening adds an entropy-minimization term on unlabeled-node predictions to the training objective for graph node classification.
-
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models
Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.
-
The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification
Tuning receptive field sizes in ResNet and DenseNet enables them to outperform VGG models on acoustic scene classification across three datasets.
-
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
-
SleepNet and DreamNet: Enriching and Reconstructing Representations for Consolidated Visual Classification
SleepNet and DreamNet enrich visual features via supervised pre-trained encoders and reconstruct hidden states with encoder-decoder frameworks to outperform prior state-of-the-art classifiers.
- The General Theory of Localization Methods