ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse

Deyu Chen; Guohao Chen; Jiahao Yang; Mingkui Tan; Pengcheng Wu; Shuaicheng Niu; Zhiqi Shen; Zitian Zhang

arxiv: 2509.23183 · v3 · pith:DGAYPAVEnew · submitted 2025-09-27 · 💻 cs.LG · cs.NI

ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse

Guohao Chen , Shuaicheng Niu , Deyu Chen , Jiahao Yang , Zitian Zhang , Mingkui Tan , Pengcheng Wu , Zhiqi Shen This is my paper

Pith reviewed 2026-05-21 21:31 UTC · model grok-4.3

classification 💻 cs.LG cs.NI

keywords test-time adaptationentropy minimizationcollapse preventionSiamese architectureasymmetric divergencevision adaptationLLM reasoning

0 comments

The pith

An asymmetric Siamese architecture with a learnable predictor and stop-gradient prevents collapse in test-time entropy minimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pure entropy minimization at test time can push models toward collapsed outputs like constant one-hot predictions that minimize the objective without real adaptation. ZeroSiam counters this by building an efficient asymmetric Siamese structure that aligns divergences between branches. The asymmetry comes from adding a learnable predictor on one side and a stop-gradient before the classifier on the other. This setup not only blocks trivial collapse but also regularizes biased learning signals, which improves results even on runs that would not have collapsed anyway. Experiments across vision tasks and large language model reasoning show stable gains with almost no added cost.

Core claim

ZeroSiam prevents collapse through asymmetric divergence alignment, efficiently achieved by a learnable predictor and a stop-gradient operator before the classifier. We provide empirical and theoretical evidence that ZeroSiam not only prevents collapse, but also regularizes biased learning signals, enhancing performance even when no collapse occurs. Despite its simplicity, extensive results show that ZeroSiam performs more stably over prior methods using negligible overhead, demonstrating efficacy on both vision adaptation and large language model reasoning tasks across challenging test scenarios and diverse models, including particularly collapse-prone tiny models.

What carries the argument

ZeroSiam, an asymmetric Siamese architecture that creates divergence alignment using a learnable predictor paired with a stop-gradient operator before the classifier.

If this is right

Test-time adaptation becomes more stable on tiny models that previously collapsed under entropy minimization.
Performance gains appear on both vision adaptation and large language model reasoning without extra compute.
Biased learning signals get regularized even in regimes where collapse would not have occurred.
The method works with negligible overhead compared with earlier regularization approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same asymmetry pattern could be inserted into other test-time objectives such as pseudo-labeling or consistency losses to check for similar collapse resistance.
If the predictor-plus-stop-gradient pair proves robust, future work might replace more complex regularizers with this lightweight asymmetry.
Extending the approach to continual test-time adaptation over long sequences of shifting data could test whether the regularization effect persists.

Load-bearing premise

The combination of a learnable predictor plus stop-gradient will reliably produce useful asymmetry that prevents collapse and regularizes signals across many models, datasets, and regimes without creating new failure modes or needing heavy tuning.

What would settle it

Run test-time entropy minimization on a collapse-prone tiny vision model with and without the stop-gradient operator; check whether constant-class outputs appear only in the version that removes the stop-gradient.

Figures

Figures reproduced from arXiv: 2509.23183 by Deyu Chen, Guohao Chen, Jiahao Yang, Mingkui Tan, Pengcheng Wu, Shuaicheng Niu, Zhiqi Shen, Zitian Zhang.

**Figure 2.** Figure 2: Empirical evidence of ZeroSiam’s stabilization effects. (a) records the Frobenius distance [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Resistance to learning from noise. Models pre-adapt on N pure Gaussian noise, then run TTA on ImageNet-C (level 5). Resistance to Learning from Noise In dynamic real-world applications, models may frequently encounter test data that are severely corrupted and non-semantic, such as extreme occluded frames and pure sensor noise where no valid label exists. Minimizing entropy on these data can then be misle… view at source ↗

**Figure 4.** Figure 4: Sensitivity to learning rates. Results are reported on ImageNet-C (level 5) with ViT-Base under label shifts w.r.t. Accuracy. Sensitivity to Learning Rates We further examine the sensitivity of ZeroSiam to learning rates. When the predictor learning rate is set to zero, the predictor becomes a frozen identity and ZeroSiam degenerates to Tent. From [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 1.** Figure 1: Our IMAGENET-C dataset consists of 15 types of algorithmically generated corruptions from noise, blur, weather, and digital categories. Each type of corruption has five levels of severity, resulting in 75 distinct corruptions. See different severity levels in Appendix B. face of minor input changes. Now in order to approximate C, E and these robustness measures, we designed a set of corruptions and perturb… view at source ↗

read the original abstract

Test-time entropy minimization helps adapt a model to novel environments and incentivize its reasoning capability, unleashing the model's potential during inference by allowing it to evolve and improve in real-time using its own predictions, achieving promising performance. However, pure entropy minimization can favor non-generalizable shortcuts, such as inflating the logit norm and driving all predictions to a dominant class to reduce entropy, risking collapsed solutions (e.g., constant one-hot outputs) that trivially minimize the objective without meaningful learning. In this paper, we reveal asymmetry as a key mechanism for collapse prevention and introduce ZeroSiam--an efficient asymmetric Siamese architecture tailored for test-time entropy minimization. ZeroSiam prevents collapse through asymmetric divergence alignment, efficiently achieved by a learnable predictor and a stop-gradient operator before the classifier. We provide empirical and theoretical evidence that ZeroSiam not only prevents collapse, but also regularizes biased learning signals, enhancing performance even when no collapse occurs. Despite its simplicity, extensive results show that ZeroSiam performs more stably over prior methods using negligible overhead, demonstrating efficacy on both vision adaptation and large language model reasoning tasks across challenging test scenarios and diverse models, including particularly collapse-prone tiny models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ZeroSiam gives a simple asymmetric Siamese fix that stops collapse in test-time entropy minimization and shows gains on vision and LLM tasks with low overhead.

read the letter

The main point is that this paper pins collapse in test-time entropy minimization on symmetric signals and counters it with an asymmetric Siamese structure: a learnable predictor on one side and a stop-gradient right before the classifier on the other. That construction is the concrete new piece, and they argue it creates useful divergence alignment while also damping biased gradients even when collapse is not happening. They back this with experiments on vision adaptation and LLM reasoning, including the small models that are most prone to trivial one-hot solutions, and they report stable results at negligible extra cost compared with earlier fixes. The design stays lightweight, which matters for real deployment where you cannot afford heavy extra modules at inference. On the positive side, the empirical coverage across tasks and model sizes gives a practical sense that the asymmetry helps without introducing obvious new instabilities in the regimes they tested. The theoretical part is mentioned as supporting evidence for why the asymmetry blocks logit-norm inflation and constant outputs. Soft spots are mainly around robustness. The claim that this specific predictor-plus-stop-gradient combo works reliably without per-task retuning or new failure modes rests on the reported ablations and the tiny-model results, but those still leave room for questions about predictor initialization sensitivity or behavior under larger distribution shifts or longer sequences. If the theory is mostly explanatory rather than a parameter-free derivation, that limits how far the generalization argument can be pushed without more exhaustive checks. Overall this is for people already using or studying entropy-based test-time adaptation who need a drop-in stabilizer. Practitioners and TTA researchers would get the most direct value. It is worth sending for peer review because the problem is real, the proposed fix is cheap and targeted, and the experiments give enough signal to merit detailed referee scrutiny even if revisions are needed on the theory and sensitivity analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated premise that the stop-gradient and predictor create a stable asymmetric signal.

pith-pipeline@v0.9.0 · 5764 in / 1095 out tokens · 69677 ms · 2026-05-21T21:31:53.399270+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ZeroSiam prevents collapse through asymmetric divergence alignment, efficiently achieved by a learnable predictor and a stop-gradient operator before the classifier.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

asymmetric predictor–target alignment to prevent collapsed constant solutions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 4 internal anchors

[1]

Understanding the impact of entropy on policy optimization

Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, and Dale Schuurmans. Understanding the impact of entropy on policy optimization. In International conference on machine learning, pp.\ 151--160. PMLR, 2019

work page 2019
[2]

Vicreg: Variance-invariance-covariance regularization for self-supervised learning

Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. In International Conference on Learning Representations, 2022

work page 2022
[3]

u hler, Felix Wiewel, Mario D \

Alexander Bartler, Andre B \"u hler, Felix Wiewel, Mario D \"o bler, and Bin Yang. Mt3: Meta test-time training for self-supervised test-time adaption. In International Conference on Artificial Intelligence and Statistics, pp.\ 3080--3090. PMLR, 2022

work page 2022
[4]

Cross-device collaborative test-time adaptation

Guohao Chen, Shuaicheng Niu, Deyu Chen, Shuhai Zhang, Changsheng Li, Yuanqing Li, and Mingkui Tan. Cross-device collaborative test-time adaptation. In Advances in Neural Information Processing Systems, volume 37, pp.\ 122917--122951, 2024

work page 2024
[5]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pp.\ 1597--1607, 2020

work page 2020
[6]

Exploring simple siamese representation learning

Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 15750--15758, 2021

work page 2021
[7]

American invitational mathematics examination-aime, 2024

MAA Codeforces. American invitational mathematics examination-aime, 2024

work page 2024
[8]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 248--255, 2009

work page 2009
[9]

Test-time model adaptation for quantized neural networks

Zeshuai Deng, Guohao Chen, Shuaicheng Niu, Hui Luo, Shuhai Zhang, Yifan Yang, Renjie Chen, Wei Luo, and Mingkui Tan. Test-time model adaptation for quantized neural networks. arXiv preprint arXiv:2508.02180, 2025

work page arXiv 2025
[10]

The llama 3 herd of models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv e-prints, pp.\ arXiv--2407, 2024

work page 2024
[11]

Test-time training with masked autoencoders

Yossi Gandelsman, Yu Sun, Xinlei Chen, and Alexei Efros. Test-time training with masked autoencoders. In Advances in Neural Information Processing Systems, volume 35, pp.\ 29374--29385, 2022

work page 2022
[12]

Semi-supervised learning by entropy minimization

Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems, volume 17, 2004

work page 2004
[13]

Bootstrap your own latent-a new approach to self-supervised learning

Jean-Bastien Grill, Florian Strub, Florent Altch \'e , Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. In Advances in neural information processing systems, volume 33, pp.\ 21271--21284, 2020

work page 2020
[14]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pp.\ 1861--1870. Pmlr, 2018

work page 2018
[15]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 9729--9738, 2020

work page 2020
[16]

Benchmarking neural network robustness to common corruptions and perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, pp.\ 1--11, 2019

work page 2019
[17]

Measuring mathematical problem solving with the math dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

work page 2021
[18]

Lora: Low-rank adaptation of large language models

Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

work page 2022
[19]

Test-time learning for large language models

Jinwu Hu, Zitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, and Mingkui Tan. Test-time learning for large language models. In International Conference on Machine Learning. PMLR, 2025 a

work page 2025
[20]

Beyond entropy: Region confidence proxy for wild test-time adaptation

Zixuan Hu, Yichun Hu, Xiaotong Li, Shixiang Tang, and Lingyu Duan. Beyond entropy: Region confidence proxy for wild test-time adaptation. In International Conference on Machine Learning, 2025 b

work page 2025
[21]

Self-training large language models with confident reasoning

Hyosoon Jang, Yunhui Jang, Sungjae Lee, Jungseul Ok, and Sungsoo Ahn. Self-training large language models with confident reasoning. arXiv preprint arXiv:2505.17454, 2025

work page arXiv 2025
[22]

Instance weighting for domain adaptation in nlp

Jing Jiang and ChengXiang Zhai. Instance weighting for domain adaptation in nlp. In Annual Meeting of the Association for Computational Linguistics, 2007

work page 2007
[23]

How to escape saddle points efficiently

Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M Kakade, and Michael I Jordan. How to escape saddle points efficiently. In International conference on machine learning, pp.\ 1724--1732. PMLR, 2017

work page 2017
[24]

Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks

Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, 2013

work page 2013
[25]

Entropy is not enough for test-time adaptation: From the perspective of disentangled factors

Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, and Sungroh Yoon. Entropy is not enough for test-time adaptation: From the perspective of disentangled factors. In International Conference on Learning Representations, 2024

work page 2024
[26]

Solving quantitative reasoning problems with language models

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et al. Solving quantitative reasoning problems with language models. In Advances in Neural Information Processing Systems, volume 35, pp.\ 3843--3857, 2022

work page 2022
[27]

A comprehensive survey on test-time adaptation under distribution shifts

Jian Liang, Ran He, and Tieniu Tan. A comprehensive survey on test-time adaptation under distribution shifts. International Journal of Computer Vision, 133 0 (1): 0 31--64, 2025

work page 2025
[28]

Let's verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let's verify step by step. In The Twelfth International Conference on Learning Representations, 2023

work page 2023
[29]

Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, volume 34, pp.\ 21808--21820, 2021

Yuejiang Liu, Parth Kothari, Bastien van Delft, Baptiste Bellot-Gurlet, Taylor Mordan, and Alexandre Alahi. Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, volume 34, pp.\ 21808--21820, 2021

work page 2021
[30]

Unsupervised domain adaptation with residual transfer networks

Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, volume 29, 2016

work page 2016
[31]

Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction

Robert A Marsden, Mario D \"o bler, and Bin Yang. Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction. In Winter Conference on Applications of Computer Vision , pp.\ 2555--2565, 2024

work page 2024
[32]

Human-level control through deep reinforcement learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518 0 (7540): 0 529--533, 2015

work page 2015
[33]

Asynchronous methods for deep reinforcement learning

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp.\ 1928--1937. PmLR, 2016

work page 1928
[34]

Minimal-Entropy Correlation Alignment for Unsupervised Deep Domain Adaptation

Pietro Morerio, Jacopo Cavazza, and Vittorio Murino. Minimal-entropy correlation alignment for unsupervised deep domain adaptation. arXiv preprint arXiv:1711.10288, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[35]

Bigger, regularized, optimistic: scaling for compute and sample efficient continuous control

Michal Nauman, Mateusz Ostaszewski, Krzysztof Jankowski, Piotr Mi o \'s , and Marek Cygan. Bigger, regularized, optimistic: scaling for compute and sample efficient continuous control. Advances in neural information processing systems, 37: 0 113038--113071, 2024

work page 2024
[36]

Efficient test-time model adaptation without forgetting

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test-time model adaptation without forgetting. In International Conference on Machine Learning, pp.\ 16888--16905. PMLR, 2022

work page 2022
[37]

Towards stable test-time adaptation in dynamic wild world

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world. In Internetional Conference on Learning Representations, pp.\ 1--14, 2023

work page 2023
[38]

Test-time model adaptation with only forward passes

Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, and Peilin Zhao. Test-time model adaptation with only forward passes. In International Conference on Machine Learning, 2024

work page 2024
[39]

Adapt in the wild: Test-time entropy minimization with sharpness and feature regularization

Shuaicheng Niu, Guohao Chen, Deyu Chen, Yifan Zhang, Jiaxiang Wu, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Chunyan Miao, and Mingkui Tan. Adapt in the wild: Test-time entropy minimization with sharpness and feature regularization. arXiv preprint arXiv:2509.04977, 2025 a

work page arXiv 2025
[40]

Self-bootstrapping for versatile test-time adaptation

Shuaicheng Niu, Guohao Chen, Peilin Zhao, Tianyi Wang, Pengcheng Wu, and Zhiqi Shen. Self-bootstrapping for versatile test-time adaptation. In International Conference on Machine Learning, 2025 b

work page 2025
[41]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[42]

Regularization with stochastic transformations and perturbations for deep semi-supervised learning

Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in Neural Information Processing Systems, volume 29, 2016

work page 2016
[43]

A theoretical analysis of contrastive unsupervised representation learning

Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khandeparkar. A theoretical analysis of contrastive unsupervised representation learning. In International conference on machine learning, pp.\ 5628--5637. PMLR, 2019

work page 2019
[44]

Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pp.\ 1889--1897. PMLR, 2015

work page 2015
[45]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[46]

Bigger, better, faster: Human-level atari with human-level efficiency

Max Schwarzer, Johan Samir Obando Ceron, Aaron Courville, Marc G Bellemare, Rishabh Agarwal, and Pablo Samuel Castro. Bigger, better, faster: Human-level atari with human-level efficiency. In International Conference on Machine Learning, pp.\ 30365--30380. PMLR, 2023

work page 2023
[47]

Test-time training with self-supervision for generalization under distribution shifts

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. Test-time training with self-supervision for generalization under distribution shifts. In International Conference on Machine Learning, pp.\ 9229--9248, 2020

work page 2020
[48]

Uncertainty-calibrated test-time model adaptation without forgetting

Mingkui Tan, Guohao Chen, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Peilin Zhao, and Shuaicheng Niu. Uncertainty-calibrated test-time model adaptation without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[49]

Mathscale: Scaling instruction tuning for mathematical reasoning

Zhengyang Tang, Xingxing Zhang, Benyou Wang, and Furu Wei. Mathscale: Scaling instruction tuning for mathematical reasoning. In Forty-first International Conference on Machine Learning, 2024

work page 2024
[50]

Tent: Fully test-time adaptation by entropy minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations, pp.\ 1--12, 2021

work page 2021
[51]

Pytorch image models

Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019

work page 2019
[52]

Test-time adapted reinforcement learning with action entropy regularization

Shoukai Xu, Mingkui Tan, Liu Liu, Zhong Zhang, Peilin Zhao, et al. Test-time adapted reinforcement learning with action entropy regularization. In International Conference on Machine Learning, 2025

work page 2025
[53]

Towards test time adaptation via calibrated entropy minimization

Hao Yang, Min Wang, Jinshen Jiang, and Yun Zhou. Towards test time adaptation via calibrated entropy minimization. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.\ 3736--3746, 2024

work page 2024
[54]

Barlow twins: Self-supervised learning via redundancy reduction

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and St \'e phane Deny. Barlow twins: Self-supervised learning via redundancy reduction. In International conference on machine learning, pp.\ 12310--12320. PMLR, 2021

work page 2021
[55]

How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning

Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Trung X Pham, Chang D Yoo, and In So Kweon. How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning. In International Conference on Learning Representations, 2022 a

work page 2022
[56]

Memo: Test time robustness via adaptation and augmentation

Marvin Mengxin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation. In Advances in Neural Information Processing Systems, pp.\ 38629--38642, 2022 b

work page 2022
[57]

Come: Test-time adaption by conservatively minimizing entropy

Qingyang Zhang, Yatao Bian, Xinke Kong, Peilin Zhao, and Changqing Zhang. Come: Test-time adaption by conservatively minimizing entropy. In International Conference on Learning Representations, 2025

work page 2025
[58]

Maximum entropy inverse reinforcement learning

Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, Anind K Dey, et al. Maximum entropy inverse reinforcement learning. In AAAI Conference on Artificial Intelligence, volume 8, pp.\ 1433--1438. Chicago, IL, USA, 2008

work page 2008
[59]

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng, Shang Qu, Ganqu Cui, Xuekai Zhu, Haozhan Li, Yuchen Zhang, Xinwei Long, Ermo Hua, et al. Ttrl: Test-time reinforcement learning. arXiv preprint arXiv:2504.16084, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[60]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[61]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[62]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[63]

EmtJ ԾޭDJB #eM -_w4 KUhee]Y iuu((ٲAKr!@9. H O9G벓AN'^ rrd H + MF^CV s+\*6A /sp op? nOpD[5] |3 \쟻x_q1A=pg]Is`c DGb1 ] j ]7I `( NR 7ƾZ SNѝdy HNYA FV =# \0] I5

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page 2013

[1] [1]

Understanding the impact of entropy on policy optimization

Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, and Dale Schuurmans. Understanding the impact of entropy on policy optimization. In International conference on machine learning, pp.\ 151--160. PMLR, 2019

work page 2019

[2] [2]

Vicreg: Variance-invariance-covariance regularization for self-supervised learning

Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. In International Conference on Learning Representations, 2022

work page 2022

[3] [3]

u hler, Felix Wiewel, Mario D \

Alexander Bartler, Andre B \"u hler, Felix Wiewel, Mario D \"o bler, and Bin Yang. Mt3: Meta test-time training for self-supervised test-time adaption. In International Conference on Artificial Intelligence and Statistics, pp.\ 3080--3090. PMLR, 2022

work page 2022

[4] [4]

Cross-device collaborative test-time adaptation

Guohao Chen, Shuaicheng Niu, Deyu Chen, Shuhai Zhang, Changsheng Li, Yuanqing Li, and Mingkui Tan. Cross-device collaborative test-time adaptation. In Advances in Neural Information Processing Systems, volume 37, pp.\ 122917--122951, 2024

work page 2024

[5] [5]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pp.\ 1597--1607, 2020

work page 2020

[6] [6]

Exploring simple siamese representation learning

Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 15750--15758, 2021

work page 2021

[7] [7]

American invitational mathematics examination-aime, 2024

MAA Codeforces. American invitational mathematics examination-aime, 2024

work page 2024

[8] [8]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 248--255, 2009

work page 2009

[9] [9]

Test-time model adaptation for quantized neural networks

Zeshuai Deng, Guohao Chen, Shuaicheng Niu, Hui Luo, Shuhai Zhang, Yifan Yang, Renjie Chen, Wei Luo, and Mingkui Tan. Test-time model adaptation for quantized neural networks. arXiv preprint arXiv:2508.02180, 2025

work page arXiv 2025

[10] [10]

The llama 3 herd of models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv e-prints, pp.\ arXiv--2407, 2024

work page 2024

[11] [11]

Test-time training with masked autoencoders

Yossi Gandelsman, Yu Sun, Xinlei Chen, and Alexei Efros. Test-time training with masked autoencoders. In Advances in Neural Information Processing Systems, volume 35, pp.\ 29374--29385, 2022

work page 2022

[12] [12]

Semi-supervised learning by entropy minimization

Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems, volume 17, 2004

work page 2004

[13] [13]

Bootstrap your own latent-a new approach to self-supervised learning

Jean-Bastien Grill, Florian Strub, Florent Altch \'e , Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. In Advances in neural information processing systems, volume 33, pp.\ 21271--21284, 2020

work page 2020

[14] [14]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pp.\ 1861--1870. Pmlr, 2018

work page 2018

[15] [15]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 9729--9738, 2020

work page 2020

[16] [16]

Benchmarking neural network robustness to common corruptions and perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, pp.\ 1--11, 2019

work page 2019

[17] [17]

Measuring mathematical problem solving with the math dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

work page 2021

[18] [18]

Lora: Low-rank adaptation of large language models

Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

work page 2022

[19] [19]

Test-time learning for large language models

Jinwu Hu, Zitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, and Mingkui Tan. Test-time learning for large language models. In International Conference on Machine Learning. PMLR, 2025 a

work page 2025

[20] [20]

Beyond entropy: Region confidence proxy for wild test-time adaptation

Zixuan Hu, Yichun Hu, Xiaotong Li, Shixiang Tang, and Lingyu Duan. Beyond entropy: Region confidence proxy for wild test-time adaptation. In International Conference on Machine Learning, 2025 b

work page 2025

[21] [21]

Self-training large language models with confident reasoning

Hyosoon Jang, Yunhui Jang, Sungjae Lee, Jungseul Ok, and Sungsoo Ahn. Self-training large language models with confident reasoning. arXiv preprint arXiv:2505.17454, 2025

work page arXiv 2025

[22] [22]

Instance weighting for domain adaptation in nlp

Jing Jiang and ChengXiang Zhai. Instance weighting for domain adaptation in nlp. In Annual Meeting of the Association for Computational Linguistics, 2007

work page 2007

[23] [23]

How to escape saddle points efficiently

Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M Kakade, and Michael I Jordan. How to escape saddle points efficiently. In International conference on machine learning, pp.\ 1724--1732. PMLR, 2017

work page 2017

[24] [24]

Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks

Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, 2013

work page 2013

[25] [25]

Entropy is not enough for test-time adaptation: From the perspective of disentangled factors

Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, and Sungroh Yoon. Entropy is not enough for test-time adaptation: From the perspective of disentangled factors. In International Conference on Learning Representations, 2024

work page 2024

[26] [26]

Solving quantitative reasoning problems with language models

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et al. Solving quantitative reasoning problems with language models. In Advances in Neural Information Processing Systems, volume 35, pp.\ 3843--3857, 2022

work page 2022

[27] [27]

A comprehensive survey on test-time adaptation under distribution shifts

Jian Liang, Ran He, and Tieniu Tan. A comprehensive survey on test-time adaptation under distribution shifts. International Journal of Computer Vision, 133 0 (1): 0 31--64, 2025

work page 2025

[28] [28]

Let's verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let's verify step by step. In The Twelfth International Conference on Learning Representations, 2023

work page 2023

[29] [29]

Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, volume 34, pp.\ 21808--21820, 2021

Yuejiang Liu, Parth Kothari, Bastien van Delft, Baptiste Bellot-Gurlet, Taylor Mordan, and Alexandre Alahi. Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, volume 34, pp.\ 21808--21820, 2021

work page 2021

[30] [30]

Unsupervised domain adaptation with residual transfer networks

Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, volume 29, 2016

work page 2016

[31] [31]

Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction

Robert A Marsden, Mario D \"o bler, and Bin Yang. Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction. In Winter Conference on Applications of Computer Vision , pp.\ 2555--2565, 2024

work page 2024

[32] [32]

Human-level control through deep reinforcement learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518 0 (7540): 0 529--533, 2015

work page 2015

[33] [33]

Asynchronous methods for deep reinforcement learning

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp.\ 1928--1937. PmLR, 2016

work page 1928

[34] [34]

Minimal-Entropy Correlation Alignment for Unsupervised Deep Domain Adaptation

Pietro Morerio, Jacopo Cavazza, and Vittorio Murino. Minimal-entropy correlation alignment for unsupervised deep domain adaptation. arXiv preprint arXiv:1711.10288, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[35] [35]

Bigger, regularized, optimistic: scaling for compute and sample efficient continuous control

Michal Nauman, Mateusz Ostaszewski, Krzysztof Jankowski, Piotr Mi o \'s , and Marek Cygan. Bigger, regularized, optimistic: scaling for compute and sample efficient continuous control. Advances in neural information processing systems, 37: 0 113038--113071, 2024

work page 2024

[36] [36]

Efficient test-time model adaptation without forgetting

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Efficient test-time model adaptation without forgetting. In International Conference on Machine Learning, pp.\ 16888--16905. PMLR, 2022

work page 2022

[37] [37]

Towards stable test-time adaptation in dynamic wild world

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world. In Internetional Conference on Learning Representations, pp.\ 1--14, 2023

work page 2023

[38] [38]

Test-time model adaptation with only forward passes

Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, and Peilin Zhao. Test-time model adaptation with only forward passes. In International Conference on Machine Learning, 2024

work page 2024

[39] [39]

Adapt in the wild: Test-time entropy minimization with sharpness and feature regularization

Shuaicheng Niu, Guohao Chen, Deyu Chen, Yifan Zhang, Jiaxiang Wu, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Chunyan Miao, and Mingkui Tan. Adapt in the wild: Test-time entropy minimization with sharpness and feature regularization. arXiv preprint arXiv:2509.04977, 2025 a

work page arXiv 2025

[40] [40]

Self-bootstrapping for versatile test-time adaptation

Shuaicheng Niu, Guohao Chen, Peilin Zhao, Tianyi Wang, Pengcheng Wu, and Zhiqi Shen. Self-bootstrapping for versatile test-time adaptation. In International Conference on Machine Learning, 2025 b

work page 2025

[41] [41]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[42] [42]

Regularization with stochastic transformations and perturbations for deep semi-supervised learning

Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in Neural Information Processing Systems, volume 29, 2016

work page 2016

[43] [43]

A theoretical analysis of contrastive unsupervised representation learning

Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khandeparkar. A theoretical analysis of contrastive unsupervised representation learning. In International conference on machine learning, pp.\ 5628--5637. PMLR, 2019

work page 2019

[44] [44]

Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pp.\ 1889--1897. PMLR, 2015

work page 2015

[45] [45]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[46] [46]

Bigger, better, faster: Human-level atari with human-level efficiency

Max Schwarzer, Johan Samir Obando Ceron, Aaron Courville, Marc G Bellemare, Rishabh Agarwal, and Pablo Samuel Castro. Bigger, better, faster: Human-level atari with human-level efficiency. In International Conference on Machine Learning, pp.\ 30365--30380. PMLR, 2023

work page 2023

[47] [47]

Test-time training with self-supervision for generalization under distribution shifts

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. Test-time training with self-supervision for generalization under distribution shifts. In International Conference on Machine Learning, pp.\ 9229--9248, 2020

work page 2020

[48] [48]

Uncertainty-calibrated test-time model adaptation without forgetting

Mingkui Tan, Guohao Chen, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Peilin Zhao, and Shuaicheng Niu. Uncertainty-calibrated test-time model adaptation without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025

[49] [49]

Mathscale: Scaling instruction tuning for mathematical reasoning

Zhengyang Tang, Xingxing Zhang, Benyou Wang, and Furu Wei. Mathscale: Scaling instruction tuning for mathematical reasoning. In Forty-first International Conference on Machine Learning, 2024

work page 2024

[50] [50]

Tent: Fully test-time adaptation by entropy minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations, pp.\ 1--12, 2021

work page 2021

[51] [51]

Pytorch image models

Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019

work page 2019

[52] [52]

Test-time adapted reinforcement learning with action entropy regularization

Shoukai Xu, Mingkui Tan, Liu Liu, Zhong Zhang, Peilin Zhao, et al. Test-time adapted reinforcement learning with action entropy regularization. In International Conference on Machine Learning, 2025

work page 2025

[53] [53]

Towards test time adaptation via calibrated entropy minimization

Hao Yang, Min Wang, Jinshen Jiang, and Yun Zhou. Towards test time adaptation via calibrated entropy minimization. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.\ 3736--3746, 2024

work page 2024

[54] [54]

Barlow twins: Self-supervised learning via redundancy reduction

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and St \'e phane Deny. Barlow twins: Self-supervised learning via redundancy reduction. In International conference on machine learning, pp.\ 12310--12320. PMLR, 2021

work page 2021

[55] [55]

How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning

Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Trung X Pham, Chang D Yoo, and In So Kweon. How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning. In International Conference on Learning Representations, 2022 a

work page 2022

[56] [56]

Memo: Test time robustness via adaptation and augmentation

Marvin Mengxin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation. In Advances in Neural Information Processing Systems, pp.\ 38629--38642, 2022 b

work page 2022

[57] [57]

Come: Test-time adaption by conservatively minimizing entropy

Qingyang Zhang, Yatao Bian, Xinke Kong, Peilin Zhao, and Changqing Zhang. Come: Test-time adaption by conservatively minimizing entropy. In International Conference on Learning Representations, 2025

work page 2025

[58] [58]

Maximum entropy inverse reinforcement learning

Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, Anind K Dey, et al. Maximum entropy inverse reinforcement learning. In AAAI Conference on Artificial Intelligence, volume 8, pp.\ 1433--1438. Chicago, IL, USA, 2008

work page 2008

[59] [59]

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng, Shang Qu, Ganqu Cui, Xuekai Zhu, Haozhan Li, Yuchen Zhang, Xinwei Long, Ermo Hua, et al. Ttrl: Test-time reinforcement learning. arXiv preprint arXiv:2504.16084, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[60] [60]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[61] [61]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[62] [62]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[63] [63]

EmtJ ԾޭDJB #eM -_w4 KUhee]Y iuu((ٲAKr!@9. H O9G벓AN'^ rrd H + MF^CV s+\*6A /sp op? nOpD[5] |3 \쟻x_q1A=pg]Is`c DGb1 ] j ]7I `( NR 7ƾZ SNѝdy HNYA FV =# \0] I5

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page 2013