arxiv: 2604.04552 · v3 · submitted 2026-04-06 · 💻 cs.CV · cs.AI

Recognition: no theorem link

StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods

Zheng Li , Jerry Cheng , Huanying Helen Gu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:36 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords test-time adaptationensemble methodslogit aggregationvision modelstraining-freecoherent-batch inferenceImageNetprediction stability

0 comments

The pith

A training-free method stabilizes logit aggregation to improve vision model accuracy at test time without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies an instability in ensemble prediction aggregation caused by nonlinear projections and voting. It introduces StableTTA with two variants to address this while keeping methods training-free and low-cost. StableTTA-I applies variance-aware logit aggregation when inputs arrive in coherent batches where nearby samples tend to share a class, as in video or robotics. StableTTA-II adds feature-level cropping so aggregation needs only one forward pass on a single model. If these hold, they would let existing vision models gain accuracy in practical settings with far less memory and compute than full ensembles.

Core claim

StableTTA addresses both efficiency challenges and aggregation inconsistency by providing two training-free test-time adaptation variants. StableTTA-I targets coherent-batch inference settings and improves prediction consistency and accuracy through variance-aware logit aggregation. StableTTA-II establishes feature-level cropping that enables efficient logit aggregation with a single forward pass on a single model backbone. Experiments on ImageNet-1K across 71 models show StableTTA-I consistently improves accuracy under coherent-batch inference while StableTTA-II delivers lightweight architecture-agnostic gains with minimal overhead.

What carries the argument

Variance-aware logit aggregation combined with feature-level cropping, which together stabilize ensemble outputs and reduce the number of forward passes needed during inference.

If this is right

StableTTA-I substantially improves prediction consistency and accuracy under coherent-batch inference such as video streams, burst photography, robotics perception, and industrial inspection.
StableTTA-II enables efficient logit aggregation with a single forward pass and minimal computational overhead while remaining architecture-agnostic.
Both variants operate without any model training or parameter updates and apply across a wide range of vision models.
Inference-time semantic coherence and aggregation stability offer practical perspectives for improving test-time adaptation systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If coherence between nearby inputs is common in many real deployments, these techniques could serve as a default lightweight post-processing step for deployed vision systems.
The same stability principle might extend to other multi-input aggregation tasks such as multi-view 3D reconstruction or temporal sensor fusion.
Combining StableTTA variants with existing domain-adaptation methods could be tested to measure additive gains when both coherence and distribution shift are present.

Load-bearing premise

Temporally or semantically adjacent observations are likely to belong to the same class in the target deployment settings.

What would settle it

Evaluating the method on a test set of randomly ordered images where adjacent samples come from unrelated classes would show no accuracy gain or a drop for StableTTA-I.

Figures

Figures reproduced from arXiv: 2604.04552 by Huanying Helen Gu, Jerry Cheng, Zheng Li.

**Figure 1.** Figure 1: Top: Milestone comparison. We show that StableTTA+MobileNetV3 significantly outperforms the base ViT in terms of performance (+11.75% accuracy), memory usage (-97.1% parameters), and computational cost (-89.1% GFLOPs). Bottom: General comparison. StableTTA improves baseline models by 11%-33% in accuracy, with 34 models achieving more than 95% accuracy. Our method yields consistent and significant improveme… view at source ↗

**Figure 2.** Figure 2: (a) Conflict: Given branch logits {z (i) | i = 1, 2, 3}, probabilities p (i) = softmax(z (i) ), and predictions yˆ (i) = argmax p (i) , different aggregation strategies may yield inconsistent results. Here, logit averaging predicts yˆlogit = 1, soft voting predicts yˆsoft = 2, and hard voting predicts yˆhard = 3. (a) Explanation: When logits (z (1) , z (2) , . . .) are sparsely distributed, the conflict yˆ… view at source ↗

**Figure 3.** Figure 3: Superior Efficiency and Accuracy of StableTTA. Comparison of the baseline (blue) and StableTTA (red) across the number of model parameters (left), peak GFLOPs in sequential aggregation mode (middle), and total GFLOPs in parallel aggregation mode (right). 4 8 16 32 70 80 90 100 Number of Experts (N) Top-1 Validation Accuracy (%) AlexNet 4 8 16 32 88 92 96 100 Number of Experts (N) ResNet50† 4 8 16 32 88 92 … view at source ↗

**Figure 4.** Figure 4: (a) TTA (with our augmentation) vs. StableTTA. (b) StableTTA is robust to [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: illustrates three common inference strategies for model predictions. (a) shows the baseline approach, where a single model processes the input once to produce an output. (b) presents a multimodel ensemble, in which predictions from several independently trained models are combined to improve accuracy, at the cost of increased the total model size and the computational overhead. (c) demonstrates TTA, where… view at source ↗

**Figure 6.** Figure 6: Empirical cumulative distribution functions (ECDFs) of [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Monte Carlo simulation. The conflict probability increases as Var(z) grows. In this simulation, we consider distributions: {z ∼ N (µ, σI) | µ ∈ {(1, 0.9),(1, 0.7),(1, 0.5)}, σ ∈ [0.05, 0.25]}. The solid curves show Monte Carlo estimates of the relationship between σ and P(ˆylogit ̸= ˆyhard), while the dashed curves correspond to the theoretical (asymptotic) predictions. The empirical and theoretical result… view at source ↗

**Figure 8.** Figure 8: provides an intuitive visualization of how different data augmentation strategies influence the variance of logits under the Hölder continuity assumption. According to Eq. (1), the distance between logit vectors is bounded by the distance between augmented inputs. As illustrated in the left example, translation preserves the semantic structure of the image but introduces large pixel-wise differences betwee… view at source ↗

read the original abstract

Ensemble methods improve predictive performance but often incur high memory and computational costs. We identify an aggregation instability induced by nonlinear projection and voting operations. To address both efficiency challenges and this inconsistency, we propose StableTTA, a training-free test-time adaptation method with two variants. StableTTA-I targets coherent-batch inference settings, where temporally or semantically adjacent observations are likely to belong to the same class. Examples include burst photography, video streams, robotics perception, and industrial inspection. Under coherent-batch inference, StableTTA-I substantially improves prediction consistency and accuracy through variance-aware logit aggregation. StableTTA-II establishes feature-level cropping, enabling efficient logit aggregation with a single forward pass on a single model backbone. Experiments on ImageNet-1K across 71 models demonstrate that StableTTA-I consistently improves prediction accuracy under coherent-batch inference, while StableTTA-II provides lightweight and architecture-agnostic accuracy improvements with minimal computational overhead. These results suggest that inference-time semantic coherence and aggregation stability provide useful perspectives for improving practical test-time adaptation systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces StableTTA, a training-free test-time adaptation method with two variants for vision models. StableTTA-I applies variance-aware logit aggregation to improve prediction consistency and accuracy under coherent-batch inference (where temporally or semantically adjacent observations are assumed likely to share the same class, as in video streams or robotics). StableTTA-II uses feature-level cropping to enable efficient logit aggregation via a single forward pass on one backbone. Experiments on ImageNet-1K across 71 models are reported to show consistent accuracy gains for StableTTA-I under coherent-batch settings and lightweight improvements for StableTTA-II with minimal overhead.

Significance. If the empirical gains hold under realistic conditions, the work offers a practical, low-cost perspective on stabilizing ensemble-style aggregation at inference time without retraining or architecture changes. The training-free design and focus on semantic coherence in batches could be useful for deployment scenarios like video or burst photography, provided the coherence assumption transfers beyond idealized test conditions.

major comments (2)

[Experiments] The abstract and experimental description do not specify how coherent batches are constructed on ImageNet-1K (e.g., whether they are formed by perfectly class-homogeneous blocks or by temporally adjacent samples with possible transitions). If the former, the variance reduction in StableTTA-I is maximized artificially and the reported gains may not generalize to the target settings (video streams, robotics) that exhibit gradual class changes and label noise.
[Experiments] No details are provided on statistical significance testing, error bars, variance across runs, or exact baseline implementations (including how standard logit averaging or voting is performed). Without these, the claim of 'consistent' improvements across 71 models cannot be assessed for robustness.

minor comments (1)

[Abstract] The abstract refers to 'nonlinear projection and voting operations' inducing instability but does not define these operations or the precise aggregation formula used in StableTTA-I.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the two major comments on the experimental setup below and will incorporate clarifications and additional details in the revised version to improve transparency and robustness assessment.

read point-by-point responses

Referee: [Experiments] The abstract and experimental description do not specify how coherent batches are constructed on ImageNet-1K (e.g., whether they are formed by perfectly class-homogeneous blocks or by temporally adjacent samples with possible transitions). If the former, the variance reduction in StableTTA-I is maximized artificially and the reported gains may not generalize to the target settings (video streams, robotics) that exhibit gradual class changes and label noise.

Authors: We agree that the batch construction procedure requires explicit description. Coherent batches on ImageNet-1K were formed by sorting the validation set by ground-truth class labels and extracting contiguous blocks of same-class samples to simulate semantic adjacency under the coherence assumption stated in the paper. This design isolates the benefit of variance-aware logit aggregation without introducing label noise. We acknowledge that perfectly homogeneous blocks represent an idealized case and may overestimate gains relative to video streams with gradual transitions. In the revision we will add a dedicated subsection detailing the exact batch construction algorithm, discuss its relation to target applications, and include new experiments on partially coherent batches that incorporate controlled class transitions and label noise to better evaluate generalization. revision: yes
Referee: [Experiments] No details are provided on statistical significance testing, error bars, variance across runs, or exact baseline implementations (including how standard logit averaging or voting is performed). Without these, the claim of 'consistent' improvements across 71 models cannot be assessed for robustness.

Authors: We accept that additional statistical and implementation details are necessary for full assessment. The baselines were implemented as follows: standard logit averaging computes the mean logit vector over the batch before applying softmax; voting aggregates the argmax predictions via majority vote. All 71 models showed accuracy gains under StableTTA-I, but we did not report run-to-run variance or significance tests. In the revised manuscript we will (i) provide pseudocode for every baseline and our methods, (ii) report mean accuracy with standard deviation across three random seeds for the subset of models where stochasticity exists, (iii) include error bars on all bar plots, and (iv) add paired t-test p-values comparing StableTTA-I against the strongest baseline to quantify consistency. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical method with no self-referential reductions

full rationale

The paper introduces StableTTA as a training-free test-time adaptation approach based on variance-aware logit aggregation for coherent-batch settings and feature-level cropping for efficiency. No equations, derivations, or parameter-fitting steps are described that reduce by construction to the method's own inputs or outputs. The claims rest on empirical experiments across 71 models on ImageNet-1K rather than any self-citation chain, uniqueness theorem, or ansatz imported from prior author work. The derivation chain is self-contained as a set of heuristic aggregation rules validated externally through accuracy measurements.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claims rest on the existence of coherent-batch settings in practice and on the empirical observation of aggregation instability; no explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5480 in / 1129 out tokens · 35630 ms · 2026-05-10T19:36:51.035650+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 6 canonical work pages · 4 internal anchors

[1]

Bagging predictors.Machine learning, 24(2):123–140, 1996

Leo Breiman. Bagging predictors.Machine learning, 24(2):123–140, 1996

1996
[2]

Autoaugment: Learning augmentation strategies from data

Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning augmentation strategies from data. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 113–123, 2019

2019
[3]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

2009
[4]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[5]

Learning both weights and connections for efficient neural network.Advances in neural information processing systems, 28, 2015

Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network.Advances in neural information processing systems, 28, 2015

2015
[6]

The elements of statistical learning: data mining, inference, and prediction, 2009

Trevor Hastie. The elements of statistical learning: data mining, inference, and prediction, 2009

2009
[7]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

2016
[8]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

Searching for mobilenetv3

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019

2019
[10]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

2017
[11]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.arXiv2016, arXiv:1602.07360

Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size.arXiv preprint arXiv:1602.07360, 2016

work page arXiv 2016
[12]

Efficient tests for normality, homoscedasticity and serial independence of regression residuals.Economics letters, 6(3):255–259, 1980

Carlos M Jarque and Anil K Bera. Efficient tests for normality, homoscedasticity and serial independence of regression residuals.Economics letters, 6(3):255–259, 1980

1980
[13]

Learning loss for test-time augmentation

Ildoo Kim, Younghoon Kim, and Sungwoong Kim. Learning loss for test-time augmentation. Advances in neural information processing systems, 33:4163–4174, 2020

2020
[14]

Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

2012
[15]

Losstransform: Reformulating the loss function for contrastive learning.Information, 16(12):1068, 2025

Zheng Li, Jerry Cheng, and Huanying Helen Gu. Losstransform: Reformulating the loss function for contrastive learning.Information, 16(12):1068, 2025

2025
[16]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

2021
[17]

Swin transformer v2: Scaling up capacity and resolution

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, et al. Swin transformer v2: Scaling up capacity and resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12009–12019, 2022. 10

2022
[18]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022

2022
[19]

Fully convolutional networks for se- mantic segmentation

Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for se- mantic segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015

2015
[20]

Greedy policy search: A simple baseline for learnable test-time augmentation

Alexander Lyzhov, Yuliya Molchanova, Arsenii Ashukha, Dmitry Molchanov, and Dmitry Vetrov. Greedy policy search: A simple baseline for learnable test-time augmentation. In Conference on uncertainty in artificial intelligence, pages 1308–1317. PMLR, 2020

2020
[21]

Shufflenet v2: Practical guidelines for efficient cnn architecture design

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. InProceedings of the European conference on computer vision (ECCV), pages 116–131, 2018

2018
[22]

Exploring the limits of weakly supervised pretraining

Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, and Laurens Van Der Maaten. Exploring the limits of weakly supervised pretraining. InProceedings of the European conference on computer vision (ECCV), pages 181–196, 2018

2018
[23]

Scikit- learn: Machine learning in python.the Journal of machine Learning research, 12:2825–2830, 2011

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit- learn: Machine learning in python.the Journal of machine Learning research, 12:2825–2830, 2011

2011
[24]

How to train state-of-the-art models us- ing torchvision’s latest primitives

PyTorch Team. How to train state-of-the-art models us- ing torchvision’s latest primitives. https://pytorch.org/blog/ how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/ ,
[25]

Accessed: 2026-03-25

2026
[26]

Faster r-cnn: Towards real-time object detection with region proposal networks.Advances in neural information processing systems, 28, 2015

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks.Advances in neural information processing systems, 28, 2015

2015
[27]

arXiv preprint arXiv:2104.10972 , year=

Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, and Lihi Zelnik-Manor. Imagenet-21k pretraining for the masses.arXiv preprint arXiv:2104.10972, 2021

work page arXiv 2021
[28]

Mobilenetv2: Inverted residuals and linear bottlenecks

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018

2018
[29]

Better aggregation in test-time augmentation

Divya Shanmugam, Davis Blalock, Guha Balakrishnan, and John Guttag. Better aggregation in test-time augmentation. InProceedings of the IEEE/CVF international conference on computer vision, pages 1214–1223, 2021

2021
[30]

Test-time augmen- tation improves efficiency in conformal prediction

Divya Shanmugam, Helen Lu, Swami Sankaranarayanan, and John Guttag. Test-time augmen- tation improves efficiency in conformal prediction. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 20622–20631, 2025

2025
[31]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[32]

Going deeper with convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

2015
[33]

Re- thinking the inception architecture for computer vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Re- thinking the inception architecture for computer vision. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016

2016
[34]

Efficientnet: Rethinking model scaling for convolutional neural networks

Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational conference on machine learning, pages 6105–6114. PMLR, 2019. 11

2019
[35]

Efficientnetv2: Smaller models and faster training

Mingxing Tan and Quoc Le. Efficientnetv2: Smaller models and faster training. InInternational conference on machine learning, pages 10096–10106. PMLR, 2021

2021
[36]

Mnasnet: Platform-aware neural architecture search for mobile

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2820–2828, 2019

2019
[37]

Torchvision: Pytorch’s computer vision library, 2016

TorchVision Contributors. Torchvision: Pytorch’s computer vision library, 2016. https: //github.com/pytorch/vision

2016
[38]

Maxvit: Multi-axis vision transformer

Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. Maxvit: Multi-axis vision transformer. InEuropean conference on computer vision, pages 459–479. Springer, 2022

2022
[39]

Lipschitz regularity of deep neural networks: analysis and efficient estimation.Advances in neural information processing systems, 31, 2018

Aladin Virmaux and Kevin Scaman. Lipschitz regularity of deep neural networks: analysis and efficient estimation.Advances in neural information processing systems, 31, 2018

2018
[40]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017

2017
[41]

Cutmix: Regularization strategy to train strong classifiers with localizable features

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019

2019
[42]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks.arXiv preprint arXiv:1605.07146, 2016

work page internal anchor Pith review arXiv 2016
[43]

mixup: Beyond empirical risk minimization

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. InInternational Conference on Learning Representations, 2018. A Preliminary Concepts and Reproduction of Prior Work In this section, we review key concepts and reproduce prior TTA results to highlight their limitations and motivate our method. Fi...

2018