pith. machine review for the scientific record. sign in

arxiv: 2604.09863 · v1 · submitted 2026-04-10 · 💻 cs.CV · cs.AI

Recognition: unknown

PAS: Estimating the target accuracy before domain adaptation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:37 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords domain adaptationtransferability estimationpre-trained modelssource domain selectionimage classificationfeature embeddingsaccuracy prediction
0
0 comments X

The pith

PAS estimates target accuracy after domain adaptation using only pre-trained feature embeddings from source and target data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Selecting a source domain and pre-trained model for domain adaptation is difficult without labeled target data, as exhaustive trials are computationally expensive. The paper proposes PAS, a score that measures compatibility by comparing embeddings produced by a pre-trained model on source and target samples. This score is intended to predict the accuracy that would result after running adaptation. A sympathetic reader would care because an accurate upfront estimate would let users pick the best combination quickly, avoid wasted adaptation runs on poor matches, and improve final performance on image classification tasks.

Core claim

PAS is a score that assesses source-target compatibility from pre-trained feature embeddings and correlates strongly with the actual target accuracy obtained after domain adaptation, allowing a framework to identify the most suitable pre-trained model and source domain among multiple candidates without performing adaptation or accessing target labels.

What carries the argument

PAS score, which quantifies compatibility between source and target using pre-trained feature embeddings to predict post-adaptation accuracy.

If this is right

  • The source and pre-trained model with the highest PAS score will produce the best target accuracy after adaptation.
  • The selection framework avoids running adaptation for every candidate combination.
  • PAS maintains strong correlation with target accuracy across standard image classification benchmarks.
  • Using PAS reduces overall computational cost while still achieving higher final performance than random or heuristic selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • PAS could be applied early to decide whether any adaptation is worthwhile for a given target before committing resources.
  • The embedding-based compatibility idea might extend to other transfer settings such as object detection where source selection is also costly.
  • Pre-training pipelines could be optimized by maximizing expected PAS against common target distributions.

Load-bearing premise

Compatibility assessed solely from pre-trained feature embeddings on source and target data accurately predicts post-adaptation target accuracy without needing to run the adaptation or access target labels.

What would settle it

Select several source-model pairs for a given target dataset, compute PAS for each, run the actual domain adaptation on all pairs, and check whether the pair with the highest PAS score yields the highest measured target accuracy on held-out labels.

Figures

Figures reproduced from arXiv: 2604.09863 by Jackson de Faria, Martin Ester, Raphaella Diniz.

Figure 1
Figure 1. Figure 1: The Potential Adaptability Score (PAS) estimates the performance of adapting to an unlabeled target domain given a pre-trained fea￾ture extractor and a labeled source domain. It helps in the selection of the best pre-trained model and best source domain among many candidates and is highly correlated with the final target accu￾racy after domain adaptation. In many real applications, data is collected from d… view at source ↗
Figure 2
Figure 2. Figure 2: Source and target samples in the embed￾ding space of a pre-trained model. (top left) Ide￾ally, a target sample from a given class should be more similar, and hence closer in the embedding space of the pre-trained model, to a source sample from the same class. (top right) If new discrim￾inative features need to be learned, the chances of overfitting on the source domain during adap￾tation increase. (bottom)… view at source ↗
Figure 3
Figure 3. Figure 3: The correlation between the PAS score value and the target accuracy after the domain adaptation. Each box summarizes the target accuracy of different domain adaptation methods for a given source-target pair and a pre-trained feature extractor. Higher values for the PAS score are strongly correlated with higher target accuracy. are similar to the centroid of the source class cluster. However, due to the mis… view at source ↗
Figure 4
Figure 4. Figure 4: The PAS value and target accuracy for the DANN and MCC methods using different pre￾trained feature extractors. The PAS score can help to select the feature extractor that leads to higher accuracy. (left) A→C adaptation in the Office￾Home benchmark. (right) W→A adaptation in the Office-31 benchmark. Datasets. We evaluate PAS on four of the most popular benchmarks for domain adapta￾tion: Office-Home Venkates… view at source ↗
Figure 5
Figure 5. Figure 5: The PAS value varying with the number of samples for the Office￾Home. The PAS values are quite robust to varying numbers of samples. Most importantly, the relative order of PAS values for different source domains re￾mains unchanged. The results in the literature presented in table 1 com￾pare methods with different backbones and demonstrate that PAS can be applied to select the most suitable pre￾trained fea… view at source ↗
Figure 6
Figure 6. Figure 6: Examples of images misclassified by the domain adaptation method [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
read the original abstract

The goal of domain adaptation is to make predictions for unlabeled samples from a target domain with the help of labeled samples from a different but related source domain. The performance of domain adaptation methods is highly influenced by the choice of source domain and pre-trained feature extractor. However, the selection of source data and pre-trained model is not trivial due to the absence of a labeled validation set for the target domain and the large number of available pre-trained models. In this work, we propose PAS, a novel score designed to estimate the transferability of a source domain set and a pre-trained feature extractor to a target classification task before actually performing domain adaptation. PAS leverages the generalization power of pre-trained models and assesses source-target compatibility based on the pre-trained feature embeddings. We integrate PAS into a framework that indicates the most relevant pre-trained model and source domain among multiple candidates, thus improving target accuracy while reducing the computational overhead. Extensive experiments on image classification benchmarks demonstrate that PAS correlates strongly with actual target accuracy and consistently guides the selection of the best-performing pre-trained model and source domain for adaptation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PAS, a score to estimate target-domain accuracy in unsupervised domain adaptation before running adaptation. PAS is computed from source and target embeddings produced by a frozen pre-trained feature extractor and is used to rank candidate pre-trained models and source domains. The authors claim that PAS exhibits strong correlation with post-adaptation target accuracy and reliably selects the best-performing combination, thereby reducing the need to execute multiple adaptation runs. Experiments on standard image-classification DA benchmarks are reported to support the correlation and selection claims.

Significance. If the predictive link holds, PAS would offer a practical, low-cost method for model and source selection in DA pipelines where many pre-trained backbones and source datasets are available. This addresses a real engineering bottleneck and could be adopted in large-scale DA workflows. The work also contributes an explicit compatibility metric grounded in pre-trained embeddings, which may stimulate further research on transferability estimation.

major comments (2)
  1. [Experiments section] Experiments section (correlation tables): the reported strong correlation between PAS and post-adaptation accuracy is presented without an accompanying ablation that measures the magnitude of embedding-space shift induced by the adaptation step itself (e.g., Procrustes distance or CKA between pre- and post-adaptation features). Because PAS is defined exclusively on the frozen pre-adaptation embeddings, any substantial geometry change during adaptation directly threatens the validity of the predictive claim; this analysis is load-bearing for the central assertion.
  2. [Section 3] Section 3 (PAS definition): the compatibility score is constructed from source-target embedding statistics, yet the manuscript provides no derivation or bound showing why this statistic remains monotonic with accuracy after the feature extractor is updated by the chosen adaptation algorithm. The absence of such a link leaves the method empirically driven and vulnerable to the circularity concern that the same adaptation outcomes used to validate PAS may implicitly influence the embedding choices.
minor comments (2)
  1. [Section 3] Notation for the PAS formula is introduced without an explicit equation number; subsequent references to “the PAS score” become ambiguous when multiple variants are compared.
  2. [Figure 2] Figure captions for the selection-framework diagrams do not list the exact adaptation algorithms and hyper-parameters used in the reported runs, making reproducibility difficult.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment point by point below, indicating planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Experiments section] Experiments section (correlation tables): the reported strong correlation between PAS and post-adaptation accuracy is presented without an accompanying ablation that measures the magnitude of embedding-space shift induced by the adaptation step itself (e.g., Procrustes distance or CKA between pre- and post-adaptation features). Because PAS is defined exclusively on the frozen pre-adaptation embeddings, any substantial geometry change during adaptation directly threatens the validity of the predictive claim; this analysis is load-bearing for the central assertion.

    Authors: We agree that an analysis of embedding-space shifts is important for validating the predictive power of PAS. In the revised manuscript, we will add a new ablation subsection that computes Procrustes distance and CKA between pre-adaptation and post-adaptation feature embeddings across the DA methods and benchmarks used. This will quantify the geometric changes and support the claim that pre-adaptation embeddings remain sufficiently representative. revision: yes

  2. Referee: [Section 3] Section 3 (PAS definition): the compatibility score is constructed from source-target embedding statistics, yet the manuscript provides no derivation or bound showing why this statistic remains monotonic with accuracy after the feature extractor is updated by the chosen adaptation algorithm. The absence of such a link leaves the method empirically driven and vulnerable to the circularity concern that the same adaptation outcomes used to validate PAS may implicitly influence the embedding choices.

    Authors: We acknowledge that PAS is an empirical measure without a formal derivation or bound establishing monotonicity after feature updates. The score is motivated by the transferability of pre-trained representations, and its utility is demonstrated through extensive empirical correlations rather than theory. PAS is computed exclusively on frozen pre-adaptation embeddings, independent of subsequent adaptation steps, which avoids direct circularity in validation. We will expand Section 3 with additional discussion of the empirical motivation, design choices, and limitations. revision: partial

standing simulated objections not resolved
  • Providing a rigorous theoretical derivation or bound guaranteeing monotonicity of PAS with post-adaptation accuracy under general adaptation algorithms.

Circularity Check

0 steps flagged

No significant circularity: PAS is defined on pre-trained embeddings and validated empirically against post-adaptation accuracy

full rationale

The paper defines PAS as a compatibility score computed from frozen pre-trained feature embeddings on source and target data, then integrates it into a selection framework and reports empirical correlations with actual target accuracy after domain adaptation on image classification benchmarks. No equation or claim reduces the predicted accuracy to the PAS inputs by construction, no parameter is fitted on the target outcomes and then renamed as a prediction, and no load-bearing step relies on self-citation chains or imported uniqueness theorems. The validation experiments are external to the definition of PAS, so the derivation chain remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that pre-trained model embeddings encode transferable information about source-target compatibility; PAS itself is an invented scoring entity whose exact computation is not specified in the abstract.

axioms (1)
  • domain assumption Pre-trained models possess generalization power that can be assessed via their feature embeddings for source-target compatibility
    Directly invoked in the abstract as the basis for PAS.
invented entities (1)
  • PAS score no independent evidence
    purpose: Estimate transferability of a source domain set and pre-trained feature extractor to a target classification task
    Newly proposed metric whose definition and computation are introduced in this work.

pith-pipeline@v0.9.0 · 5484 in / 1216 out tokens · 42645 ms · 2026-05-10T17:37:01.526160+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 4 canonical work pages

  1. [1]

    Universal domain adaptation from foundation models: A baseline study

    Bin Deng and Kui Jia. Universal domain adaptation from foundation models: A baseline study. arXiv preprint arXiv:2305.11092,

  2. [2]

    Concept decompositions for large sparse text data using clustering.Machine learning, 42(1):143–175,

    10 Published as a conference paper at ICLR 2026 Inderjit S Dhillon and Dharmendra S Modha. Concept decompositions for large sparse text data using clustering.Machine learning, 42(1):143–175,

  3. [3]

    Domain-adversarial training of neural net- works.The journal of machine learning research, 17(1):2096–2030,

    Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Franc ¸ois Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural net- works.The journal of machine learning research, 17(1):2096–2030,

  4. [4]

    Minimum class confusion for versatile domain adaptation

    Ying Jin, Ximei Wang, Mingsheng Long, and Jianmin Wang. Minimum class confusion for versatile domain adaptation. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 464–480. Springer,

  5. [5]

    Trans- ferability estimation using bhattacharyya class separability

    11 Published as a conference paper at ICLR 2026 Michal P´andy, Andrea Agostinelli, Jasper Uijlings, Vittorio Ferrari, and Thomas Mensink. Trans- ferability estimation using bhattacharyya class separability. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9172–9182,

  6. [6]

    Syn2real: A new benchmark forsynthetic-to-real visual domain adaptation.arXiv preprint arXiv:1806.09755,

    Xingchao Peng, Ben Usman, Kuniaki Saito, Neela Kaushik, Judy Hoffman, and Kate Saenko. Syn2real: A new benchmark forsynthetic-to-real visual domain adaptation.arXiv preprint arXiv:1806.09755,

  7. [7]

    Adapting visual category models to new domains

    Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. Adapting visual category models to new domains. InComputer vision–ECCV 2010: 11th European conference on computer vision, Heraklion, Crete, Greece, September 5-11, 2010, proceedings, part iV 11, pp. 213–226. Springer,

  8. [8]

    Deep coral: Correlation alignment for deep domain adaptation

    Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. In Computer vision–ECCV 2016 workshops: Amsterdam, the Netherlands, October 8-10 and 15-16, 2016, proceedings, part III 14, pp. 443–450. Springer,

  9. [9]

    Erm++: An improved baseline for domain generalization.arXiv preprint arXiv:2304.01973,

    Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko, and Bryan A Plummer. Erm++: An improved baseline for domain generalization.arXiv preprint arXiv:2304.01973,

  10. [10]

    Deep Domain Confusion: Maximizing for Domain Invariance

    Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. Deep domain confusion: Maximizing for domain invariance.arXiv preprint arXiv:1412.3474,

  11. [11]

    Deep hashing network for unsupervised domain adaptation

    12 Published as a conference paper at ICLR 2026 Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. Deep hashing network for unsupervised domain adaptation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5018–5027,

  12. [12]

    Many images contain more than one object

    domain of theImageCLEF benchmark. Many images contain more than one object. The sample may be very similar to a class present in the image. However, the true class refers to another object also contained in the image. In such cases, thePASvalue is high, but the accuracy is low. 13 Published as a conference paper at ICLR 2026 Table 6: Target accuracy of do...