pith. machine review for the scientific record. sign in

arxiv: 2603.22908 · v3 · submitted 2026-03-24 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Adaptive Dual-Teacher Distillation with Subnetwork Rectification for Bridging Semantic Gaps in Black-Box Domain Adaptation

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:58 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords black-box domain adaptationdual-teacher distillationsubnetwork rectificationvision-language modelspseudo-label fusionsemantic gap bridgingself-training prototypes
0
0 comments X

The pith

DDSR reconciles black-box source predictions with vision-language priors through adaptive fusion and subnetwork regularization to improve target domain adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a method for black-box domain adaptation where only source model predictions are available, without access to source data or parameters. It proposes DDSR to fuse those predictions with semantic priors from vision-language models, using adaptive strategies to generate reliable pseudo-labels while applying subnetwork regularization to prevent overfitting to noise. Iterative refinement of predictions, prompts, and class prototypes further aligns the knowledge sources. A sympathetic reader would care because this setup enables practical adaptation in restricted scenarios where full source information cannot be shared.

Core claim

DDSR explicitly reconciles task-specific knowledge from black-box predictions and language-aligned priors from vision-language models by employing adaptive prediction fusion for pseudo-label generation, subnetwork-based regularization that enforces output consistency and gradient divergency to mitigate overfitting, progressive iterative refinement of target predictions and ViL prompts for better semantic alignment, and class-wise prototypes for final self-training optimization, resulting in consistent outperformance on benchmark datasets even against methods with source access.

What carries the argument

Adaptive dual-teacher distillation with subnetwork rectification, which fuses black-box and vision-language predictions while enforcing consistency constraints to bridge semantic discrepancies.

If this is right

  • Target models trained this way achieve higher accuracy on multiple benchmarks without needing source data or parameters.
  • Pseudo-label quality improves iteratively as target predictions refine both labels and vision-language prompts.
  • Subnetwork regularization reduces overfitting to noisy supervision from the fused sources.
  • The approach surpasses existing methods that have access to source data or model weights.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The regularization mechanism might transfer to other distillation tasks where supervision is noisy or multi-source.
  • Privacy-sensitive applications such as medical imaging could adopt this without sharing raw source data.
  • Extending the fusion strategy to additional teacher models beyond vision-language ones could further stabilize adaptation in low-data regimes.

Load-bearing premise

The inherent discrepancy between task-specific black-box predictions and language-aligned vision-language priors can be reconciled through adaptive fusion and regularization without introducing new systematic errors or biases in the target domain.

What would settle it

Running DDSR on a new domain adaptation benchmark and finding that its accuracy falls below a simple pseudo-labeling baseline from the black-box predictions alone would show the reconciliation step adds no benefit.

Figures

Figures reproduced from arXiv: 2603.22908 by Jianhua Zhang, Jing Li, Qinghua Hu, Shengyong Chen, Wanli Xue, Xu Cheng, Zhe Zhang.

Figure 1
Figure 1. Figure 1: The overview of our proposed DDSR framework. The training process consists of two stages. In stage one, DDSR [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: t-SNE visualizations of target features for D [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Training convergence and stability. Accuracy curves on [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effect of the subnetwork ratio γ on the Ar→Rw task of Office-Home and the D→A task of Office-31. The accuracy reaches its peak when γ = 0.84, while remaining stable across different values, showing the robustness of the method. TABLE IV: Accuracy (%) under different values of the thresh￾old ∆˜ GU on several tasks of Office-31 and Office-Home. The best performance is consistently achieved when ∆˜ GU = 0.05.… view at source ↗
Figure 7
Figure 7. Figure 7: Prediction entropy of CLIP vs. the source model on [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Accuracy with different CLIP weights (0.2, 0.4, 0.6, [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
read the original abstract

Assuming that neither source data nor source model parameters are accessible, black-box domain adaptation (BBDA) represents a highly practical yet challenging setting, where transferable knowledge is limited to the predictions of a black-box source model. Existing approaches exploit such knowledge via pseudo-label refinement or by leveraging vision-language models (ViLs), but they often fail to reconcile the inherent discrepancy between task-specific knowledge from black-box models and language-aligned semantic priors of ViLs, resulting in suboptimal integration and degraded adaptation performance. To address this challenge, we propose adaptive Dual-Teacher Distillation with Subnetwork Rectification (DDSR), a framework that explicitly reconciles these complementary yet inconsistent knowledge sources. DDSR employs an adaptive prediction fusion strategy to integrate predictions from the black-box source model and a ViL, generating reliable pseudo-labels for the target domain. A subnetwork-based regularization mechanism mitigates overfitting to noisy supervision by enforcing output consistency and gradient divergency. Furthermore, progressively improved target predictions iteratively refine both pseudo-labels and ViL prompts, enhancing semantic alignment. Finally, class-wise prototypes are used to further optimize target predictions via self-training. Extensive experiments on multiple benchmark datasets demonstrate that DDSR consistently outperforms state-of-the-art methods, including those with access to source data or source model parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Adaptive Dual-Teacher Distillation with Subnetwork Rectification (DDSR) for black-box domain adaptation (BBDA), where only source model predictions are accessible. DDSR integrates predictions from a black-box source model and a vision-language model (ViL) via adaptive fusion to generate pseudo-labels, applies subnetwork regularization to enforce output consistency and gradient divergency, uses iterative prompt refinement and class-wise prototypes for self-training, and claims to outperform state-of-the-art methods (including source-access methods) on multiple benchmarks by reconciling task-specific and semantic priors.

Significance. If the empirical claims hold with rigorous validation, the work would advance practical BBDA under privacy constraints by providing a concrete mechanism to fuse inconsistent knowledge sources without source data or parameters. The subnetwork rectification and iterative refinement components offer a novel angle on bias mitigation in dual-teacher setups.

major comments (2)
  1. [Abstract] Abstract: the superiority claim ('consistently outperforms state-of-the-art methods, including those with access to source data') is unsupported by any quantitative results, error bars, dataset names, or ablation tables in the provided text, rendering the central contribution unverifiable.
  2. [Method/Experiments] Method and Experiments sections: the assumption that adaptive fusion plus gradient-divergency regularization reconciles ViL misalignment without injecting systematic target-domain bias lacks a quantitative bound or isolated ablation of the fusion step; if initial discrepancy is large, the consistency enforcement may regularize toward a compromised distribution rather than true semantics (see stress-test concern).
minor comments (2)
  1. [§3] Notation for the adaptive fusion weights, subnetwork selection, and gradient-divergency term should be introduced with explicit equations early in §3 to improve readability.
  2. [Related Work] Missing references to recent BBDA baselines that also use ViL priors (e.g., post-2023 works) should be added for completeness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address each major comment point-by-point below, providing clarifications based on the full manuscript content.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the superiority claim ('consistently outperforms state-of-the-art methods, including those with access to source data') is unsupported by any quantitative results, error bars, dataset names, or ablation tables in the provided text, rendering the central contribution unverifiable.

    Authors: The abstract serves as a high-level summary per standard conventions, with detailed evidence reserved for the body. The full manuscript supports the claim in Section 4: Tables 1-3 report mean accuracies and standard deviations (error bars from 3-5 runs) on Office-31, Office-Home, and VisDA-2017, explicitly comparing DDSR to both black-box and source-access baselines and showing consistent gains. Table 4 provides ablations. We can revise the abstract to name the datasets and reference the tables for added clarity. revision: partial

  2. Referee: [Method/Experiments] Method and Experiments sections: the assumption that adaptive fusion plus gradient-divergency regularization reconciles ViL misalignment without injecting systematic target-domain bias lacks a quantitative bound or isolated ablation of the fusion step; if initial discrepancy is large, the consistency enforcement may regularize toward a compromised distribution rather than true semantics (see stress-test concern).

    Authors: We provide an isolated ablation of the adaptive fusion step in Section 4.4 and Table 5, demonstrating clear gains over single-teacher baselines. The gradient-divergency term in subnetwork rectification is shown via consistency metrics and visualizations to avoid collapse to compromised distributions. No theoretical quantitative bound is derived, as the black-box dual-teacher setting makes closed-form analysis intractable, but empirical results across datasets with varying discrepancies support the absence of systematic bias. We will add a dedicated stress-test experiment in revision to simulate large initial ViL misalignment and quantify any bias. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external models and empirical validation

full rationale

The paper describes DDSR via adaptive fusion of black-box predictions with ViL priors, subnetwork regularization for consistency, iterative prompt refinement, and prototype self-training. No equations or procedures are presented that reduce any claimed prediction, pseudo-label, or performance gain to a quantity defined by the method's own fitted parameters or prior outputs. The central claims rest on integration of independent external components (black-box source model and ViL) plus benchmark experiments, with no self-definitional loops, fitted-input predictions, or load-bearing self-citations that collapse the result to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text. Any parameters (e.g., fusion coefficients) are inferred as typical training hyperparameters rather than load-bearing inventions.

axioms (1)
  • domain assumption Vision-language models supply complementary language-aligned semantic priors that can be fused with black-box task predictions
    Invoked in the abstract as the basis for bridging semantic gaps.

pith-pipeline@v0.9.0 · 5551 in / 1257 out tokens · 66638 ms · 2026-05-15T00:58:07.040914+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 3 internal anchors

  1. [1]

    Transfer adaptation learning: A decade survey,

    L. Zhang and X. Gao, “Transfer adaptation learning: A decade survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 1, pp. 23–44, 2022. I

  2. [2]

    Domain-invariant feature enhancement domain adaptation for cross-scene road damage detection,

    J. Li, Z. Qu, and X. Yin, “Domain-invariant feature enhancement domain adaptation for cross-scene road damage detection,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 36, no. 3, pp. 3466– 3480, 2026. I

  3. [3]

    Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,

    J. Liang, D. Hu, and J. Feng, “Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,” inInternational conference on machine learning. PMLR, 2020, pp. 6028–6039. I, II-A, III-B1, I, II, III, IV-B

  4. [4]

    Adversarial source generation for source-free domain adaptation,

    C. Cui, F. Meng, C. Zhang, Z. Liu, L. Zhu, S. Gong, and X. Lin, “Adversarial source generation for source-free domain adaptation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 6, pp. 4887–4898, 2024. I

  5. [5]

    Domain-division based progressive learning for source-free domain adaptation,

    P. Liu, J. Li, M. Zhao, W. Xue, Q. Hu, and S. Chen, “Domain-division based progressive learning for source-free domain adaptation,”IEEE Transactions on Multimedia, vol. 27, pp. 7081–7092, 2025. I, II-A

  6. [6]

    Generative adversarial nets,

    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2, 2014, pp. 2672–2680. I

  7. [7]

    Dine: Domain adaptation from single and multiple black-box predictors,

    J. Liang, D. Hu, J. Feng, and R. He, “Dine: Domain adaptation from single and multiple black-box predictors,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8003–8013. I, II-A, II-B, I, II, III, IV-B, IV-C, IV-C

  8. [8]

    GPT-4 Technical Report

    J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. I

  9. [9]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015. I, III-B1

  10. [10]

    Unsupervised domain adaptation of black-box source models,

    H. Zhang, Y . Zhang, K. Jia, and L. Zhang, “Unsupervised domain adaptation of black-box source models,” in32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22-25, 2021. BMV A Press, 2021, p. 147. I

  11. [11]

    RAIN: regularization on input and network for black-box domain adaptation,

    Q. Peng, Z. Ding, L. Lyu, L. Sun, and C. Chen, “RAIN: regularization on input and network for black-box domain adaptation,” inProceed- ings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. ijcai.org, 2023, pp. 4118–4126. I, II-B, III-B2, III-B2, I, II, III, IV-B

  12. [12]

    Adversarial experts model for black-box domain adaptation,

    S. Xiao, M. Ye, Q. He, S. Li, S. Tang, and X. Zhu, “Adversarial experts model for black-box domain adaptation,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 8982–8991. I, II-B, II-B, II-B, II-B, III-B3, I, II, III, IV-B, IV-C, IV-C, IV-C 12

  13. [13]

    Clip-guided black-box domain adaptation of image classification,

    L. Tian, M. Ye, L. Zhou, and Q. He, “Clip-guided black-box domain adaptation of image classification,”Signal, Image and Video Processing, vol. 18, no. 5, pp. 4637–4646, 2024. I, II-B, II-B, I, II, III, IV-B, IV-C, IV-C

  14. [14]

    Learning transferable visual models from natural language supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PMLR, 2021, pp. 8748–8763. I, II-B, II-C, III-B1

  15. [15]

    Learning transferable features with deep adaptation networks,

    M. Long, Y . Cao, J. Wang, and M. Jordan, “Learning transferable features with deep adaptation networks,” inInternational conference on machine learning. PMLR, 2015, pp. 97–105. II-A

  16. [16]

    Unsupervised domain adaptation with residual transfer networks,

    M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Unsupervised domain adaptation with residual transfer networks,”Advances in neural infor- mation processing systems, vol. 29, 2016. II-A

  17. [17]

    Domain-adversarial training of neural networks,

    Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavi- olette, M. March, and V . Lempitsky, “Domain-adversarial training of neural networks,”Journal of machine learning research, vol. 17, no. 59, pp. 1–35, 2016. II-A

  18. [18]

    Domain prompt tuning via meta relabeling for unsupervised adversarial adaptation,

    X. Jin, C. Lan, W. Zeng, and Z. Chen, “Domain prompt tuning via meta relabeling for unsupervised adversarial adaptation,”IEEE Transactions on Multimedia, vol. 26, pp. 8333–8347, 2024. II-A

  19. [19]

    Wdan: A weighted discriminative adversarial network with dual classifiers for fine-grained open-set do- main adaptation,

    J. Li, L. Yang, Q. Wang, and Q. Hu, “Wdan: A weighted discriminative adversarial network with dual classifiers for fine-grained open-set do- main adaptation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 9, pp. 5133–5147, 2023. II-A

  20. [20]

    Textadapter: Self-supervised domain adaptation for cross-domain text recognition,

    X.-Q. Liu, P.-F. Zhang, X. Luo, Z. Huang, and X.-S. Xu, “Textadapter: Self-supervised domain adaptation for cross-domain text recognition,” IEEE Transactions on Multimedia, vol. 26, pp. 9854–9865, 2024. II-A

  21. [21]

    Self-ensembling for visual domain adaptation,

    G. French, M. Mackiewicz, and M. Fisher, “Self-ensembling for visual domain adaptation,” inInternational Conference on Learning Represen- tations, 2018. II-A

  22. [22]

    Contrastive adaptation network for unsupervised domain adaptation,

    G. Kang, L. Jiang, Y . Yang, and A. G. Hauptmann, “Contrastive adaptation network for unsupervised domain adaptation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4893–4902. II-A

  23. [23]

    Independent feature decomposition and instance alignment for unsupervised domain adaptation,

    Q. He, S. Xiao, M. Ye, X. Zhu, F. Neri, and D. Hou, “Independent feature decomposition and instance alignment for unsupervised domain adaptation,” inProceedings of the thirty-second international joint conference on artificial intelligence, 2023, pp. 819–827. II-A

  24. [24]

    Enhancing multi-source open-set domain adaptation through nearest neighbor classification with self-supervised vision transformer,

    J. Li, L. Yang, and Q. Hu, “Enhancing multi-source open-set domain adaptation through nearest neighbor classification with self-supervised vision transformer,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2648–2662, 2024. II-A

  25. [25]

    Progressive curriculum learning with teacher-student collaboration for source-free unsupervised domain adaptation,

    Q. Tian, J. Shen, L. Kang, W. Ou, J. Wan, and Z. Lei, “Progressive curriculum learning with teacher-student collaboration for source-free unsupervised domain adaptation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 36, no. 2, pp. 1627–1639, 2026. II-A

  26. [26]

    Consistency regularization for generalizable source-free domain adaptation,

    L. Tang, K. Li, C. He, Y . Zhang, and X. Li, “Consistency regularization for generalizable source-free domain adaptation,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, October 2023, pp. 4323–4333. II-A

  27. [27]

    Source-free domain adaptation with class prototype discovery,

    L. Zhou, N. Li, M. Ye, X. Zhu, and S. Tang, “Source-free domain adaptation with class prototype discovery,”Pattern Recognition, vol. 145, p. 109974, 2024. II-A

  28. [28]

    Source- free domain adaptation via avatar prototype generation and adaptation,

    Z. Qiu, Y . Zhang, H. Lin, S. Niu, Y . Liu, Q. Du, and M. Tan, “Source- free domain adaptation via avatar prototype generation and adaptation,” inInternational Joint Conference on Artificial Intelligence, 2021. II-A

  29. [29]

    Source-free domain adaptation guided by vision and vision-language pre-training,

    W. Zhang, L. Shen, and C.-S. Foo, “Source-free domain adaptation guided by vision and vision-language pre-training,”International Jour- nal of Computer Vision, vol. 133, no. 2, pp. 844–866, 2025. II-A, I, II, III, IV-B

  30. [30]

    Proxy denoising for source-free domain adaptation,

    S. Tang, W. Su, Y . Gan, M. Ye, J. D. Zhang, and X. Zhu, “Proxy denoising for source-free domain adaptation,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=FIj9IEPCKr II-A, I, II, III, IV-B

  31. [31]

    Reviewing the forgotten classes for domain adaptation of black-box predictors,

    S. Zhang, C. Shen, S. L ¨u, and Z. Zhang, “Reviewing the forgotten classes for domain adaptation of black-box predictors,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 15, 2024, pp. 16 830–16 837. II-B, I, II, III, IV-B

  32. [32]

    A separation and alignment framework for black-box domain adaptation,

    M. Xia, J. Zhao, G. Lyu, Z. Huang, T. Hu, G. Chen, and H. Wang, “A separation and alignment framework for black-box domain adaptation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 14, 2024, pp. 16 005–16 013. II-B, I, II, III, IV-B

  33. [33]

    Black-box unsupervised domain adaptation with bi-directional atkinson-shiffrin memory,

    J. Zhang, J. Huang, X. Jiang, and S. Lu, “Black-box unsupervised domain adaptation with bi-directional atkinson-shiffrin memory,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11 771–11 782. II-B

  34. [34]

    Vision-language models for vision tasks: A survey,

    J. Zhang, J. Huang, S. Jin, and S. Lu, “Vision-language models for vision tasks: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5625–5644, 2024. II-B

  35. [35]

    Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,

    J. Li, D. Li, C. Xiong, and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” inInternational conference on machine learning. PMLR, 2022, pp. 12 888–12 900. II-C

  36. [36]

    Reproducible scaling laws for contrastive language-image learning,

    M. Cherti, R. Beaumont, R. Wightman, M. Wortsman, G. Ilharco, C. Gordon, C. Schuhmann, L. Schmidt, and J. Jitsev, “Reproducible scaling laws for contrastive language-image learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 2818–2829. II-C

  37. [37]

    Scaling up visual and vision-language representation learning with noisy text supervision,

    C. Jia, Y . Yang, Y . Xia, Y .-T. Chen, Z. Parekh, H. Pham, Q. Le, Y .-H. Sung, Z. Li, and T. Duerig, “Scaling up visual and vision-language representation learning with noisy text supervision,” inInternational conference on machine learning. PMLR, 2021, pp. 4904–4916. II-C

  38. [38]

    A new data augmentation method based on mixup and dempster-shafer theory,

    Z. Zhang, H. Wang, J. Geng, X. Deng, and W. Jiang, “A new data augmentation method based on mixup and dempster-shafer theory,” IEEE Transactions on Multimedia, vol. 26, pp. 4998–5013, 2024. III-B1

  39. [39]

    Temporal ensembling for semi-supervised learn- ing,

    S. Laine and T. Aila, “Temporal ensembling for semi-supervised learn- ing,” inInternational Conference on Learning Representations, 2017. III-B3, IV-E

  40. [40]

    Clip-adapter: Better vision-language models with feature adapters,

    P. Gao, S. Geng, R. Zhang, T. Ma, R. Fang, Y . Zhang, H. Li, and Y . Qiao, “Clip-adapter: Better vision-language models with feature adapters,” International journal of computer vision, vol. 132, no. 2, pp. 581–595,

  41. [41]

    Learning to prompt for vision- language models,

    K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision- language models,”International Journal of Computer Vision, vol. 130, no. 9, pp. 2337–2348, 2022. III-B3

  42. [42]

    Deep clustering for unsupervised learning of visual features,

    M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” inProceedings of the European conference on computer vision, 2018, pp. 132–149. III-C

  43. [43]

    Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer,

    J. Liang, D. Hu, Y . Wang, R. He, and J. Feng, “Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 44, no. 11, pp. 8602–8617, 2021. III-C

  44. [44]

    Maximum classifier discrepancy for unsupervised domain adaptation,

    K. Saito, K. Watanabe, Y . Ushiku, and T. Harada, “Maximum classifier discrepancy for unsupervised domain adaptation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3723–3732. I, II, III, IV-B

  45. [45]

    Homeomor- phism alignment for unsupervised domain adaptation,

    L. Zhou, M. Ye, X. Zhu, S. Xiao, X.-Q. Fan, and F. Neri, “Homeomor- phism alignment for unsupervised domain adaptation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 18 699–18 710. I, II, III, IV-B

  46. [46]

    Domain adaptation via prompt learning,

    C. Ge, R. Huang, M. Xie, Z. Lai, S. Song, S. Li, and G. Huang, “Domain adaptation via prompt learning,”IEEE Transactions on Neural Networks and Learning Systems, 2023. I, II, III, IV-B

  47. [47]

    Prompt-based distribution alignment for unsupervised domain adapta- tion,

    S. Bai, M. Zhang, W. Zhou, S. Huang, Z. Luan, D. Wang, and B. Chen, “Prompt-based distribution alignment for unsupervised domain adapta- tion,” inProceedings of the AAAI conference on artificial intelligence, vol. 38, no. 2, 2024, pp. 729–737. I, II, III, IV-B

  48. [48]

    Bridging domain spaces for unsupervised domain adaptation,

    J. Na, H. Jung, H. J. Chang, and W. Hwang, “Bridging domain spaces for unsupervised domain adaptation,”Pattern Recognition, vol. 164, p. 111537, 2025. I, II, III, IV-B

  49. [49]

    Attracting and dispersing: A simple approach for source-free domain adaptation,

    S. Yang, S. Jui, J. Van De Weijeret al., “Attracting and dispersing: A simple approach for source-free domain adaptation,”Advances in Neural Information Processing Systems, vol. 35, pp. 5802–5815, 2022. I, II, III, IV-B

  50. [50]

    Source- free domain adaptation via target prediction distribution searching,

    S. Tang, A. Chang, F. Zhang, X. Zhu, M. Ye, and C. Zhang, “Source- free domain adaptation via target prediction distribution searching,” International journal of computer vision, vol. 132, no. 3, pp. 654–672,

  51. [51]

    Dual transferable knowledge interaction for source-free domain adaptation,

    M. Zhan, Z. Wu, J. Yang, L. Peng, J. Shen, and X. Zhu, “Dual transferable knowledge interaction for source-free domain adaptation,” Information Processing & Management, vol. 63, no. 1, p. 104302, 2026. I, II, III, IV-B

  52. [52]

    Leveraging multi-level regularization for efficient domain adaptation of black-box predictors,

    W. Li, W. Zhao, X. Pan, P. Zhou, and H. Yang, “Leveraging multi-level regularization for efficient domain adaptation of black-box predictors,” Pattern Recognition, vol. 165, p. 111611, 2025. I, II, III, IV-B

  53. [53]

    Learning like a real student: Black-box domain adaptation with preview, differentiated learning and review,

    Q. Tian, Z. Liu, and W. Ou, “Learning like a real student: Black-box domain adaptation with preview, differentiated learning and review,” Image and Vision Computing, p. 105806, 2025. I, II, III, IV-B 13

  54. [54]

    Adapting visual category models to new domains,

    K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” inEuropean conference on computer vision. Springer, 2010, pp. 213–226. IV-A

  55. [55]

    Deep hashing network for unsupervised domain adaptation,

    H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” inPro- ceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5018–5027. IV-A

  56. [56]

    VisDA: The Visual Domain Adaptation Challenge

    X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko, “Visda: The visual domain adaptation challenge,”arXiv preprint arXiv:1710.06924, 2017. IV-A

  57. [57]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. IV-C

  58. [58]

    Visualizing data using t-sne,

    L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,”Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008. IV-E