pith. sign in

arxiv: 1907.11202 · v1 · pith:KJ6M6R4Ynew · submitted 2019-07-25 · 💻 cs.LG · cs.CV· stat.ML

Unsupervised Domain Adaptation via Calibrating Uncertainties

Pith reviewed 2026-05-24 16:04 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML
keywords domain adaptationRenyi entropyuncertainty estimationvariational Bayesregularizationunsupervised learning
0
0 comments X

The pith

Models can adapt from a labeled source domain to an unlabeled target domain by calibrating their predictive uncertainties measured with Renyi entropy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper seeks to perform unsupervised domain adaptation by reducing the difference in predictive uncertainties between source and target domains. It starts from the idea that models show more uncertainty on data they have not seen during training. Uncertainty is expressed through Renyi entropy, which leads to a regularization method that encourages calibration of these values. Variational Bayes is applied to get stable uncertainty estimates, and an additional term adjusts the variance of the model's parameters. The authors analyze the theory behind this and test it on several adaptation problems.

Core claim

The central discovery is that regularizing the Renyi entropy of predictions to match across domains, using variational Bayes for estimation and parameter variance calibration, allows a model trained only on source labels to perform well on the target domain.

What carries the argument

The Renyi entropy regularization (RER) framework that aligns uncertainties between domains.

If this is right

  • The uncertainties of source and target predictions can be brought into agreement through the proposed regularization.
  • Variational Bayes learning yields more dependable uncertainty measures for use in adaptation.
  • Adjusting the sample variance of network parameters provides an additional regularization effect during training.
  • The method has associated theoretical properties that justify its use.
  • It achieves good results on three different domain adaptation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Uncertainty alignment might serve as a principle for handling other types of data distribution changes.
  • The framework could be combined with other adaptation techniques for better performance.
  • Experiments on larger scale problems would test if the calibration remains effective.

Load-bearing premise

A model trained only on source domain data will show higher uncertainty when applied to target domain data.

What would settle it

If applying the RER framework fails to reduce the gap in Renyi entropy between source and target predictions or fails to raise target-domain classification accuracy above a source-only baseline.

read the original abstract

Unsupervised domain adaptation (UDA) aims at inferring class labels for unlabeled target domain given a related labeled source dataset. Intuitively, a model trained on source domain normally produces higher uncertainties for unseen data. In this work, we build on this assumption and propose to adapt from source to target domain via calibrating their predictive uncertainties. The uncertainty is quantified as the Renyi entropy, from which we propose a general Renyi entropy regularization (RER) framework. We further employ variational Bayes learning for reliable uncertainty estimation. In addition, calibrating the sample variance of network parameters serves as a plug-in regularizer for training. We discuss the theoretical properties of the proposed method and demonstrate its effectiveness on three domain-adaptation tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes to perform unsupervised domain adaptation by calibrating predictive uncertainties between source and target domains, where uncertainty is measured by Renyi entropy. It introduces a Renyi entropy regularization (RER) framework, uses variational Bayes learning for uncertainty estimation, and adds a sample-variance regularizer on network parameters. Theoretical properties of the approach are discussed and empirical effectiveness is claimed on three domain-adaptation tasks.

Significance. If the core premise holds, the work supplies a regularization-based UDA method that treats domain shift as an uncertainty mismatch, potentially complementing discrepancy or adversarial techniques. Explicit discussion of theoretical properties and the use of variational Bayes for reliable entropy estimates are strengths that could support reproducibility and analysis.

major comments (2)
  1. [Introduction] Introduction / assumption paragraph: the claim that source-trained models 'normally produce higher uncertainties for unseen data' is presented as intuitive and is load-bearing for the RER objective, yet no derivation, counter-example analysis, or condition is supplied showing that minimizing the proposed regularizer is equivalent to reducing domain discrepancy when the entropy ordering fails (e.g., due to label noise or model overconfidence).
  2. [Method] RER framework section: no formal argument establishes that the Renyi-entropy calibration step recovers target labels or aligns distributions when the source-target uncertainty gap is absent or reversed; the method therefore lacks a demonstrated necessary or sufficient condition for success.
minor comments (2)
  1. [Abstract] Abstract: states that effectiveness is demonstrated on three tasks but supplies no quantitative results, baselines, or metrics, which weakens the reader's ability to gauge empirical support from the summary alone.
  2. [Method] Notation: the Renyi entropy parameter alpha is listed as a free parameter; its selection procedure, sensitivity, and default value should be stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater rigor around our core assumption and the conditions under which the proposed regularization succeeds. We address each point below and will revise the manuscript accordingly to clarify limitations and strengthen the presentation.

read point-by-point responses
  1. Referee: [Introduction] Introduction / assumption paragraph: the claim that source-trained models 'normally produce higher uncertainties for unseen data' is presented as intuitive and is load-bearing for the RER objective, yet no derivation, counter-example analysis, or condition is supplied showing that minimizing the proposed regularizer is equivalent to reducing domain discrepancy when the entropy ordering fails (e.g., due to label noise or model overconfidence).

    Authors: We agree that the assumption is presented as intuitive rather than formally derived. The manuscript motivates RER from the common observation that target-domain predictions exhibit higher uncertainty, but does not claim equivalence to domain discrepancy minimization in all regimes. We will revise the introduction to explicitly state the assumption, note potential failure modes such as label noise or overconfidence, and add a short discussion of conditions under which the uncertainty ordering is expected to hold, supported by references to related UDA literature. revision: yes

  2. Referee: [Method] RER framework section: no formal argument establishes that the Renyi-entropy calibration step recovers target labels or aligns distributions when the source-target uncertainty gap is absent or reversed; the method therefore lacks a demonstrated necessary or sufficient condition for success.

    Authors: The RER framework is introduced as a regularization technique that operates under the stated uncertainty-gap assumption; we do not claim it recovers labels or aligns distributions when the gap is absent or reversed. We will add a clarifying paragraph in the method section stating the operating regime, noting that success is not guaranteed outside this regime, and emphasizing that the approach complements rather than replaces discrepancy or adversarial methods. Empirical results on the three tasks are presented only for settings where the premise holds. revision: yes

Circularity Check

0 steps flagged

No circularity: method rests on explicit external assumption and standard tools without self-referential reduction

full rationale

The paper states an intuitive premise (source models yield higher uncertainty on target data) and builds a regularization framework (RER using Renyi entropy, variational Bayes estimation, and parameter variance) on top of it. No quoted equation or step shows a fitted parameter or self-citation being renamed as a prediction, nor does any derivation reduce the calibration objective to its own inputs by construction. Renyi entropy and variational Bayes are imported as standard external concepts rather than defined in terms of the adaptation result. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The primary load-bearing element is the domain assumption about uncertainty differences; no new entities introduced in abstract.

free parameters (1)
  • Renyi entropy parameter (alpha)
    The order of the Renyi entropy is a hyperparameter that may need tuning, though not detailed in abstract.
axioms (1)
  • domain assumption A model trained on source domain normally produces higher uncertainties for unseen data
    Explicitly stated as the intuitive basis for the method in the abstract.

pith-pipeline@v0.9.0 · 5661 in / 1092 out tokens · 32839 ms · 2026-05-24T16:04:19.718908+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 11 internal anchors

  1. [1]

    Weight Uncertainty in Neural Networks

    C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wier- stra. Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424, 2015. 1, 2

  2. [2]

    C. Finn, P . Abbeel, and S. Levine. Model-agnostic meta- learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning- V olume 70, pages 1126–1135. JMLR. org, 2017. 2, 3

  3. [3]

    Self-ensembling for visual domain adaptation

    G. French, M. Mackiewicz, and M. Fisher. Self- ensembling for visual domain adaptation. arXiv preprint arXiv:1706.05208, 2017. 2

  4. [4]

    Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference

    Y . Gal and Z. Ghahramani. Bayesian convolutional neural networks with bernoulli approximate variational inferenc e. arXiv preprint arXiv:1506.02158, 2015. 2

  5. [5]

    Gal and Z

    Y . Gal and Z. Ghahramani. Dropout as a bayesian approxi- mation: Representing model uncertainty in deep learning. I n international conference on machine learning , pages 1050– 1059, 2016. 1, 2

  6. [6]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, and A. Courville. Deep learning. MIT press, 2016. 2

  7. [7]

    Grandvalet and Y

    Y . Grandvalet and Y . Bengio. Semi-supervised learning b y entropy minimization. In Advances in neural information processing systems, pages 529–536, 2005. 1, 3

  8. [8]

    Grandvalet and Y

    Y . Grandvalet and Y . Bengio. Entropy regularization. Semi- supervised learning, pages 151–168, 2006. 1, 3

  9. [9]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- ing for image recognition. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 770–778, 2016. 4

  10. [10]

    Higgins, L

    I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Repre- sentations, volume 3, 2017. 3

  11. [11]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, and J. Dean. Distilling the knowl edge in a neural network. arXiv preprint arXiv:1503.02531, 2015. 2

  12. [12]

    Kendall and Y

    A. Kendall and Y . Gal. What uncertainties do we need in bayesian deep learning for computer vision? In Advances in neural information processing systems, pages 5574–5584,

  13. [13]

    D. P . Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013. 2

  14. [14]

    Kumar, P

    A. Kumar, P . Sattigeri, K. Wadhawan, L. Karlinsky, R. Feris, B. Freeman, and G. Wornell. Co-regularized alignment for unsupervised domain adaptation. In Advances in Neural In- formation Processing Systems, pages 9345–9356, 2018. 2

  15. [15]

    LeCun, L

    Y . LeCun, L. Bottou, Y . Bengio, P . Haffner, et al. Gradie nt- based learning applied to document recognition. Proceed- ings of the IEEE , 86(11):2278–2324, 1998. 4

  16. [16]

    D.-H. Lee. Pseudo-label: The simple and efficient semi- supervised learning method for deep neural networks. In W orkshop on Challenges in Representation Learning, ICML, volume 3, page 2, 2013. 1, 3

  17. [17]

    M. Long, Y . Cao, J. Wang, and M. I. Jordan. Learning transferable features with deep adaptation networks. arXiv preprint arXiv:1502.02791, 2015. 4

  18. [18]

    Louizos and M

    C. Louizos and M. Welling. Multiplicative normalizing flows for variational bayesian neural networks. In Proceedings of the 34th International Conference on Machine Learning- V olume 70, pages 2218–2227. JMLR. org, 2017. 1

  19. [19]

    Distributional Smoothing with Virtual Adversarial Training

    T. Miyato, S.-i. Maeda, M. Koyama, K. Nakae, and S. Ishii . Distributional smoothing with virtual adversarial traini ng. arXiv preprint arXiv:1507.00677, 2015. 2, 4

  20. [20]

    R. M. Neal. Bayesian learning for neural networks , volume

  21. [21]

    Springer Science & Business Media, 2012. 1

  22. [22]

    Netzer, T

    Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with unsupervised fea- ture learning. 2011. 4

  23. [23]

    X. Peng, B. Usman, N. Kaushik, D. Wang, J. Hoffman, and K. Saenko. Visda: A synthetic-to-real benchmark for vi- sual domain adaptation. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition W ork- shops, pages 2021–2026, 2018. 4

  24. [24]

    Regularizing Neural Networks by Penalizing Confident Output Distributions

    G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, and G. Hi n- ton. Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548, 2017. 2

  25. [25]

    Sankaranarayanan, Y

    S. Sankaranarayanan, Y . Balaji, C. D. Castillo, and R. C hel- lappa. Generate to adapt: Aligning domains using genera- tive adversarial networks. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition , pages 8503–8512, 2018. 4

  26. [26]

    R. Shu, H. H. Bui, H. Narui, and S. Ermon. A dirt-t ap- proach to unsupervised domain adaptation. arXiv preprint arXiv:1802.08735, 2018. 2

  27. [27]

    Szegedy, V

    C. Szegedy, V . V anhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016. 2

  28. [28]

    Tzeng, J

    E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversa r- ial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 7167–7176, 2017. 4

  29. [29]

    T.-H. Vu, H. Jain, M. Bucher, M. Cord, and P . P´ erez. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. arXiv preprint arXiv:1811.12833 , 2018. 1

  30. [30]

    Rnyi entropy — Wikipedia, the free encyclopedia, 2018

    Wikipedia contributors. Rnyi entropy — Wikipedia, the free encyclopedia, 2018. [Online; accessed 13-May-2019]. 2

  31. [31]

    A. L. Y uille, P . Stolorz, and J. Utans. Statistical physics, mix- tures of distributions, and the em algorithm. Neural Compu- tation, 6(2):334–340, 1994. 3

  32. [32]

    Deep Transfer Network: Unsupervised Domain Adaptation

    X. Zhang, F. X. Y u, S.-F. Chang, and S. Wang. Deep transfe r network: Unsupervised domain adaptation. arXiv preprint arXiv:1503.00591, 2015. 4

  33. [33]

    Y . Zou, Z. Y u, B. Vijaya Kumar, and J. Wang. Unsu- pervised domain adaptation for semantic segmentation via class-balanced self-training. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 289–305, 2018. 1, 3, 4 5