pith. machine review for the scientific record. sign in

arxiv: 2604.11590 · v1 · submitted 2026-04-13 · 💻 cs.CV

Recognition: unknown

Learning Robustness at Test-Time from a Non-Robust Teacher

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:30 UTC · model grok-4.3

classification 💻 cs.CV
keywords test-time adaptationadversarial robustnessunsupervised adaptationlabel-free learningmodel adaptationrobustness-accuracy trade-off
0
0 comments X

The pith

A label-free framework anchors non-robust teacher predictions to stabilize test-time adversarial robustness adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether a pretrained model without built-in adversarial robustness can still be adapted at test time using only a small number of unlabeled target samples to improve its resistance to attacks. Straightforward extensions of classical adversarial training to this unsupervised setting turn out to be unstable and overly sensitive to hyperparameter choices when the starting teacher model itself is non-robust. The authors introduce a framework that treats the teacher's output predictions as fixed semantic anchors for both the clean and adversarial objectives during adaptation. Theoretical analysis shows this choice yields a more stable alternative to the self-consistency regularization typical in adversarial training. Experiments on CIFAR-10 and ImageNet with photometric shifts confirm gains in optimization stability, reduced hyperparameter sensitivity, and an improved robustness-accuracy trade-off.

Core claim

The proposed label-free framework uses the predictions of a non-robust teacher model as a semantic anchor for both the clean and adversarial objectives during test-time adaptation. This formulation provides a more stable alternative to self-consistency-based regularization in classical adversarial training, as shown by theoretical insights on optimization behavior. On CIFAR-10 and ImageNet under induced photometric transformations, the method achieves improved optimization stability, lower sensitivity to parameter choices, and a better robustness-accuracy trade-off than distillation-based baselines in the unsupervised post-deployment setting.

What carries the argument

The label-free framework that uses non-robust teacher predictions as semantic anchors for both clean and adversarial objectives during unsupervised test-time adaptation.

If this is right

  • The method delivers improved optimization stability relative to straightforward distillation-based adaptations of adversarial training.
  • It exhibits lower sensitivity to hyperparameter choices than self-consistency regularization approaches.
  • It produces a superior robustness-accuracy trade-off when adapting models in the unsupervised test-time setting with limited target samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The anchoring idea could be tested on distribution shifts other than photometric transformations to check broader applicability.
  • The stability analysis might extend to other forms of regularization used in robustness training.
  • In practice the approach could support post-deployment updates for models facing new environments without requiring labeled data or robust pretraining.

Load-bearing premise

The predictions of the non-robust teacher model provide a reliable semantic anchor for both clean and adversarial objectives during adaptation even though the teacher itself lacks robustness.

What would settle it

An experiment in which the proposed method shows higher hyperparameter sensitivity or a worse robustness-accuracy trade-off than classical distillation-based adversarial training adaptations on CIFAR-10 or ImageNet under photometric shifts would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.11590 by Giorgio Buttazzo, Giulio Rossolini, Stefano Bianchettin.

Figure 1
Figure 1. Figure 1: Overview of the proposed Teacher-guided Robust Adaptation (TgRA) framework, detailed in Section 5: the student model is fine-tuned at test-time under distribution shift, using a non-robust teacher as a reference for both the accuracy and robustness objectives, unlike TRADES-like approaches. The experimental analysis evaluates the proposed ap￾proach on CIFAR-10 and ImageNet-Val under simulated tar￾get scena… view at source ↗
Figure 2
Figure 2. Figure 2: Clean and robust test accuracy under PGD-20 attacks during test-time finetuning on CIFAR-10 (𝜀 = 8∕255) and ImageNet-Val (𝜀 = 2∕255), evaluated across different levels of distribution-shift severity. For ImageNet (Figure 2b), the plots show a similar trend across all severity levels, with TgRA exhibiting the most favorable convergence behavior. After the initial drop in clean accuracy during the first epoc… view at source ↗
Figure 3
Figure 3. Figure 3: Clean and robust accuracy (Top-1) on ImageNet validation under increasing photometric corruption severity. Robust accuracy is evaluated using PGD-20 with different 𝜖. (a) CIFAR-10 (b) ImageNet [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Examples of the target-domain corruptions for CIFAR-10 (top) and ImageNet (bottom). Bianchettin et al.: Preprint submitted to Elsevier Page 12 of 12 [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
read the original abstract

Nowadays, pretrained models are increasingly used as general-purpose backbones and adapted at test-time to downstream environments where target data are scarce and unlabeled. While this paradigm has proven effective for improving clean accuracy on the target domain, adversarial robustness has received far less attention, especially when the original pretrained model is not explicitly designed to be robust. This raises a practical question: \emph{can a pretrained, non-robust model be adapted at test-time to improve adversarial robustness on a target distribution?} To face this question, this work studies how adversarial training strategies behave when integrated into adaptation schemes for the unsupervised test-time setting, where only a small set of unlabeled target samples is available. It first analyzes how classical adversarial training formulations can be extended to this scenario, showing that straightforward distillation-based adaptations remain unstable and highly sensitive to hyperparameter tuning, particularly when the teacher itself is non-robust. To address these limitations, the work proposes a label-free framework that uses the predictions of a non-robust teacher model as a semantic anchor for both the clean and adversarial objectives during adaptation. We further provide theoretical insights showing that our formulation yields a more stable alternative to the self-consistency-based regularization commonly used in classical adversarial training. Experiments evaluate the proposed approach on CIFAR-10 and ImageNet under induced photometric transformations. The results support the theoretical insights by showing that the proposed approach achieves improved optimization stability, lower sensitivity to parameter choices, and a better robustness-accuracy trade-off than existing baselines in this post-deployment test-time setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that a pretrained non-robust model can be adapted at test-time for improved adversarial robustness on unlabeled target data by using the teacher's own predictions as semantic anchors in a label-free framework for both clean and adversarial objectives. It analyzes limitations of classical adversarial training extensions (instability and hyperparameter sensitivity), proposes this anchor-based alternative with theoretical insights showing greater stability than self-consistency regularization, and reports experiments on CIFAR-10 and ImageNet under photometric transformations demonstrating better optimization stability, lower parameter sensitivity, and improved robustness-accuracy trade-offs.

Significance. If the central claims hold, the work would be significant for test-time adaptation and adversarial robustness literature. It tackles the practical problem of post-deployment robustness improvement without labels or robust teachers, offering a potentially more stable formulation than distillation or self-consistency baselines. Credit is due for the theoretical insights on stability and the empirical evaluation on standard benchmarks (CIFAR-10, ImageNet), which provide concrete support for the trade-off improvements when the assumptions are met.

major comments (3)
  1. [Theoretical insights and proposed framework] The central claim in the proposed framework (as described in the abstract and methods) rests on the non-robust teacher's predictions serving as reliable semantic anchors for the adversarial objective. This assumption is load-bearing for the stability advantage over self-consistency regularization; if small perturbations flip the teacher's outputs, the joint optimization could be destabilized rather than stabilized. The theoretical analysis should explicitly address or bound the impact of such inconsistencies.
  2. [Experiments] Experiments section: evaluation is performed under induced photometric transformations on CIFAR-10 and ImageNet. These do not necessarily replicate the prediction flips induced by the adversarial perturbations inside the adaptation objective, leaving the key assumption about anchor reliability untested for the adversarial case. Direct experiments or analysis on the generated adversarial examples during adaptation are needed to substantiate the reported stability gains.
  3. [Introduction and classical adversarial training analysis] The abstract states that straightforward distillation-based adaptations remain unstable, but without a detailed comparison (e.g., specific hyperparameter ranges or failure modes in § on classical extensions), it is difficult to assess how much the proposed anchor-based objective improves upon them in a load-bearing way.
minor comments (2)
  1. [Abstract] The abstract refers to 'theoretical insights' without briefly characterizing the key result (e.g., a stability bound or reduced sensitivity derivation); adding one sentence would improve accessibility.
  2. [Proposed framework] Notation for the anchor-based objective and the clean/adversarial terms should be introduced with explicit equations early in the methods to aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly where needed to strengthen the presentation.

read point-by-point responses
  1. Referee: [Theoretical insights and proposed framework] The central claim in the proposed framework (as described in the abstract and methods) rests on the non-robust teacher's predictions serving as reliable semantic anchors for the adversarial objective. This assumption is load-bearing for the stability advantage over self-consistency regularization; if small perturbations flip the teacher's outputs, the joint optimization could be destabilized rather than stabilized. The theoretical analysis should explicitly address or bound the impact of such inconsistencies.

    Authors: We thank the referee for highlighting this assumption. Our theoretical analysis shows that anchoring to fixed teacher predictions yields a more stable objective than self-consistency regularization by avoiding mutual error reinforcement. The non-robust teacher is held fixed, so its clean predictions serve as constant references for both branches. While adversarial flips in the teacher could occur, the formulation regularizes the student toward these fixed anchors, which our experiments indicate improves stability over baselines. We will revise the theoretical section to explicitly discuss this assumption and include a brief bound based on the loss Lipschitz constant. revision: partial

  2. Referee: [Experiments] Experiments section: evaluation is performed under induced photometric transformations on CIFAR-10 and ImageNet. These do not necessarily replicate the prediction flips induced by the adversarial perturbations inside the adaptation objective, leaving the key assumption about anchor reliability untested for the adversarial case. Direct experiments or analysis on the generated adversarial examples during adaptation are needed to substantiate the reported stability gains.

    Authors: We agree that direct analysis of teacher predictions on the adversarial examples generated during adaptation would provide stronger substantiation. Our current results demonstrate stability through training dynamics and final robustness metrics under the photometric shifts, with the adversarial objective applied at each step. We will add an analysis (new figure or subsection) measuring the rate of teacher prediction changes on the on-the-fly adversarial samples and its correlation with observed stability. revision: yes

  3. Referee: [Introduction and classical adversarial training analysis] The abstract states that straightforward distillation-based adaptations remain unstable, but without a detailed comparison (e.g., specific hyperparameter ranges or failure modes in § on classical extensions), it is difficult to assess how much the proposed anchor-based objective improves upon them in a load-bearing way.

    Authors: The instability and hyperparameter sensitivity of classical extensions are analyzed in Section 3, supported by preliminary experiments showing divergence for certain ranges. To make the motivation clearer from the outset, we will expand the introduction with a concise summary of these failure modes and example hyperparameter settings, while retaining the detailed treatment in Section 3. revision: partial

Circularity Check

0 steps flagged

No circularity detected; derivation introduces independent anchor-based objective

full rationale

The paper's core contribution is a new label-free test-time adaptation framework that defines a semantic-anchor loss using the non-robust teacher's predictions for both clean and adversarial terms, together with separate theoretical stability arguments comparing it to self-consistency regularization. No equation reduces a claimed prediction or uniqueness result to a fitted parameter or prior self-citation by construction; the anchor is an explicit modeling choice rather than a re-labeling of data statistics, and the stability claim is presented as an independent analysis rather than a tautology. Experiments on photometric shifts are external to the derivation and do not close any loop. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review limits identification of specific elements; the method appears to rest on standard assumptions from adversarial training and test-time adaptation without new invented entities or fitted parameters explicitly called out.

pith-pipeline@v0.9.0 · 5575 in / 1039 out tokens · 59198 ms · 2026-05-10T15:30:47.729371+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 17 canonical work pages · 7 internal anchors

  1. [1]

    Generalizability of adversarial robust- ness under distribution shifts.arXiv preprint arXiv:2209.15042, 2022

    Kumail Alhamoud, Hasan Abed Al Kader Hammoud, Motasem Al- farra, and Bernard Ghanem. Generalizability of adversarial robust- ness under distribution shifts.arXiv preprint arXiv:2209.15042, 2022

  2. [2]

    Concrete Problems in AI Safety

    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016

  3. [3]

    Square attack: a query-efficient black-box adver- sarialattackviarandomsearch

    Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efficient black-box adver- sarialattackviarandomsearch. InEuropeanconferenceoncomputer vision, pages 484–501. Springer, 2020

  4. [4]

    Ltd: Low temperature distilla- tionforrobustadversarialtraining.arXivpreprintarXiv:2111.02331, 2021

    Erh-Chung Chen and Che-Rung Lee. Ltd: Low temperature distilla- tionforrobustadversarialtraining.arXivpreprintarXiv:2111.02331, 2021

  5. [5]

    Robustbench: a standardized adversarial robustness benchmark

    Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. Robustbench: a standardized adversarial robustness benchmark.arXiv preprint arXiv:2010.09670, 2020

  6. [6]

    Evaluating the adversarial ro- bustness of adaptive test-time defenses

    Francesco Croce, Sven Gowal, Thomas Brunner, Evan Shelhamer, Matthias Hein, and Taylan Cemgil. Evaluating the adversarial ro- bustness of adaptive test-time defenses. InInternational Conference on Machine Learning, pages 4421–4435. PMLR, 2022

  7. [7]

    Reliable evaluation of adver- sarial robustness with an ensemble of diverse parameter-free attacks

    Francesco Croce and Matthias Hein. Reliable evaluation of adver- sarial robustness with an ensemble of diverse parameter-free attacks. InInternational conference on machine learning, pages 2206–2216. PMLR, 2020

  8. [8]

    Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, andHanwangZhang.Decoupledkullback-leiblerdivergenceloss.Ad- vances in Neural Information Processing Systems, 37:74461–74486, 2024

  9. [9]

    Ad- versariallyrobustdistillationbyreducingthestudent-teachervariance gap

    Junhao Dong, Piotr Koniusz, Junxi Chen, and Yew-Soon Ong. Ad- versariallyrobustdistillationbyreducingthestudent-teachervariance gap. InEuropean Conference on Computer Vision, pages 92–111. Springer, 2024

  10. [10]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020

  11. [11]

    Born again neural networks

    Tommaso Furlanello, Zachary Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar. Born again neural networks. In International conference on machine learning, pages 1607–1616. PMLR, 2018

  12. [12]

    Boosting adversarial transferability by achieving flat local maxima.Advances in Neural Information Processing Systems, 36:70141–70161, 2023

    Zhijin Ge, Hongying Liu, Wang Xiaosen, Fanhua Shang, and Yuanyuan Liu. Boosting adversarial transferability by achieving flat local maxima.Advances in Neural Information Processing Systems, 36:70141–70161, 2023

  13. [13]

    Explaining and Harnessing Adversarial Examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Ex- plaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014

  14. [14]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770– 778, 2016

  15. [15]

    Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

    Dan Hendrycks and Thomas Dietterich. Benchmarking neural net- work robustness to common corruptions and perturbations.arXiv preprint arXiv:1903.12261, 2019

  16. [16]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowl- edge in a neural network.arXiv preprint arXiv:1503.02531, 2015

  17. [17]

    A simple fine-tuning is all you need: Towards robust deep learning via adversarial fine-tuning.arXiv preprint arXiv:2012.13628, 2020

    Ahmadreza Jeddi, Mohammad Javad Shafiee, and Alexander Wong. A simple fine-tuning is all you need: Towards robust deep learning via adversarial fine-tuning.arXiv preprint arXiv:2012.13628, 2020

  18. [18]

    Revisiting batch normalization for practical domain adaptation,

    YanghaoLi,NaiyanWang,JianpingShi,JiayingLiu,andXiaodiHou. Revisitingbatchnormalizationforpracticaldomainadaptation.arXiv preprint arXiv:1603.04779, 2016

  19. [19]

    Adaptive batch normalization networksforadversarialrobustness

    Shao-Yuan Lo and Vishal M Patel. Adaptive batch normalization networksforadversarialrobustness. In2024IEEEInternationalCon- ference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–6. IEEE, 2024

  20. [20]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.arXiv preprint arXiv:1706.06083, 2017

  21. [21]

    When adversarial training meets vision transformers: Recipes from training to architecture.Advances in Neural Information Pro- cessing Systems, 35:18599–18611, 2022

    Yichuan Mo, Dongxian Wu, Yifei Wang, Yiwen Guo, and Yisen Wang. When adversarial training meets vision transformers: Recipes from training to architecture.Advances in Neural Information Pro- cessing Systems, 35:18599–18611, 2022

  22. [22]

    Robust fine-tuning from non-robust pretrained models: Mitigating suboptimal transfer with adversarial scheduling.arXiv preprint arXiv:2509.23325, 2025

    JonasNgnawé,MaximeHeuillet,SabyasachiSahoo,YannPequignot, OlaAhmad,AudreyDurand,FrédéricPrecioso,andChristianGagné. Robust fine-tuning from non-robust pretrained models: Mitigating suboptimal transfer with adversarial scheduling.arXiv preprint arXiv:2509.23325, 2025

  23. [23]

    Knowledge distilla- tion methods for efficient unsupervised adaptation across multiple domains.Image and Vision Computing, 108:104096, 2021

    Le Thanh Nguyen-Meidine, Atif Belal, Madhu Kiran, Jose Dolz, Louis-Antoine Blais-Morin, and Eric Granger. Knowledge distilla- tion methods for efficient unsupervised adaptation across multiple domains.Image and Vision Computing, 108:104096, 2021

  24. [24]

    Medbn: Robust test-time adaptation against malicious test samples

    Hyejin Park, Jeongyeon Hwang, Sunung Mun, Sangdon Park, and Jungseul Ok. Medbn: Robust test-time adaptation against malicious test samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5997–6007, 2024

  25. [25]

    Benchmarking the spatial robustness of dnns via natural and adversarial localized corruptions.Pattern Recogni- tion, page 112412, 2025

    Giulia Marchiori Pietrosanti, Giulio Rossolini, Alessandro Biondi, and Giorgio Buttazzo. Benchmarking the spatial robustness of dnns via natural and adversarial localized corruptions.Pattern Recogni- tion, page 112412, 2025

  26. [26]

    Adversarially robust generalization requires more data.Advances in neural information processing systems, 31, 2018

    Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data.Advances in neural information processing systems, 31, 2018

  27. [27]

    Adversarial training for free!Advances in neural information processing systems, 32, 2019

    Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free!Advances in neural information processing systems, 32, 2019

  28. [28]

    Emre Celebi

    Pourya Shamsolmoali, Salvador García, Huiyu Zhou, and M. Emre Celebi. Advances in domain adaptation for computer vision.Image and Vision Computing, 114:104268, 2021

  29. [29]

    Test-timetrainingwithself-supervisionforgeneraliza- tionunderdistributionshifts

    Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and MoritzHardt. Test-timetrainingwithself-supervisionforgeneraliza- tionunderdistributionshifts. InInternationalconferenceonmachine learning, pages 9229–9248. PMLR, 2020

  30. [30]

    Measuring robustness to natural distribution shifts in image classification.Advances in Neural Infor- mation Processing Systems, 33:18583–18599, 2020

    Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Ben- jamin Recht, and Ludwig Schmidt. Measuring robustness to natural distribution shifts in image classification.Advances in Neural Infor- mation Processing Systems, 33:18583–18599, 2020

  31. [31]

    Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726, 2020

    Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726, 2020

  32. [32]

    Continual test-time domain adaptation

    Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7201– 7211, 2022

  33. [33]

    Improving adversarial robustness requires revisiting misclassified examples

    Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. InInternational conference on learning Bianchettin et al.:Preprint submitted to ElsevierPage 10 of 12 Learning Robustness at Test-Time from a Non-Robust Teacher representations, 2019

  34. [34]

    Fast is better than free: Revisiting adversarial training

    Eric Wong, Leslie Rice, and J Zico Kolter. Fast is better than free: Revisiting adversarial training.arXiv preprint arXiv:2001.03994, 2020

  35. [35]

    Wide Residual Networks

    Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016

  36. [36]

    Rethinking precision of pseudo label: Test-time adaptation via complementary learning.Pattern Recognition Letters, 177:96–102, 2024

    Longbin Zeng, Jiayi Han, Liang Du, and Weiyang Ding. Rethinking precision of pseudo label: Test-time adaptation via complementary learning.Pattern Recognition Letters, 177:96–102, 2024

  37. [37]

    Theoretically principled trade-off between robustness and accuracy

    Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InInternational conference on machine learning, pages 7472–7482. PMLR, 2019

  38. [38]

    Memo: Test time robustness via adaptation and augmentation.Advances in neural information processing systems, 35:38629–38642, 2022

    Marvin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation.Advances in neural information processing systems, 35:38629–38642, 2022

  39. [39]

    On pitfalls of test-time adaptation.arXiv preprint arXiv:2306.03536, 2023

    Hao Zhao, Yuejiang Liu, Alexandre Alahi, and Tao Lin. On pitfalls of test-time adaptation.arXiv preprint arXiv:2306.03536, 2023

  40. [40]

    Reli- able adversarial distillation with unreliable teachers.arXiv preprint arXiv:2106.04928, 2021

    JianingZhu,JiangchaoYao,BoHan,JingfengZhang,TongliangLiu, Gang Niu, Jingren Zhou, Jianliang Xu, and Hongxia Yang. Reli- able adversarial distillation with unreliable teachers.arXiv preprint arXiv:2106.04928, 2021

  41. [41]

    Im- proving generalization of adversarial training via robust critical fine- tuning

    Kaijie Zhu, Xixu Hu, Jindong Wang, Xing Xie, and Ge Yang. Im- proving generalization of adversarial training via robust critical fine- tuning. InProceedingsoftheIEEE/CVFinternationalconferenceon computer vision, pages 4424–4434, 2023. A. Appendix A.1. Robustness under distribution shift. As stated in the motivation (Section 2), the need to explore robusti...