arxiv: 2603.22908 · v3 · submitted 2026-03-24 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Adaptive Dual-Teacher Distillation with Subnetwork Rectification for Bridging Semantic Gaps in Black-Box Domain Adaptation

Zhe Zhang , Jing Li , Wanli Xue , Xu Cheng , Jianhua Zhang , Qinghua Hu , Shengyong Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:58 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords black-box domain adaptationdual-teacher distillationsubnetwork rectificationvision-language modelspseudo-label fusionsemantic gap bridgingself-training prototypes

0 comments

The pith

DDSR reconciles black-box source predictions with vision-language priors through adaptive fusion and subnetwork regularization to improve target domain adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a method for black-box domain adaptation where only source model predictions are available, without access to source data or parameters. It proposes DDSR to fuse those predictions with semantic priors from vision-language models, using adaptive strategies to generate reliable pseudo-labels while applying subnetwork regularization to prevent overfitting to noise. Iterative refinement of predictions, prompts, and class prototypes further aligns the knowledge sources. A sympathetic reader would care because this setup enables practical adaptation in restricted scenarios where full source information cannot be shared.

Core claim

DDSR explicitly reconciles task-specific knowledge from black-box predictions and language-aligned priors from vision-language models by employing adaptive prediction fusion for pseudo-label generation, subnetwork-based regularization that enforces output consistency and gradient divergency to mitigate overfitting, progressive iterative refinement of target predictions and ViL prompts for better semantic alignment, and class-wise prototypes for final self-training optimization, resulting in consistent outperformance on benchmark datasets even against methods with source access.

What carries the argument

Adaptive dual-teacher distillation with subnetwork rectification, which fuses black-box and vision-language predictions while enforcing consistency constraints to bridge semantic discrepancies.

If this is right

Target models trained this way achieve higher accuracy on multiple benchmarks without needing source data or parameters.
Pseudo-label quality improves iteratively as target predictions refine both labels and vision-language prompts.
Subnetwork regularization reduces overfitting to noisy supervision from the fused sources.
The approach surpasses existing methods that have access to source data or model weights.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The regularization mechanism might transfer to other distillation tasks where supervision is noisy or multi-source.
Privacy-sensitive applications such as medical imaging could adopt this without sharing raw source data.
Extending the fusion strategy to additional teacher models beyond vision-language ones could further stabilize adaptation in low-data regimes.

Load-bearing premise

The inherent discrepancy between task-specific black-box predictions and language-aligned vision-language priors can be reconciled through adaptive fusion and regularization without introducing new systematic errors or biases in the target domain.

What would settle it

Running DDSR on a new domain adaptation benchmark and finding that its accuracy falls below a simple pseudo-labeling baseline from the black-box predictions alone would show the reconciliation step adds no benefit.

Figures

Figures reproduced from arXiv: 2603.22908 by Jianhua Zhang, Jing Li, Qinghua Hu, Shengyong Chen, Wanli Xue, Xu Cheng, Zhe Zhang.

**Figure 1.** Figure 1: The overview of our proposed DDSR framework. The training process consists of two stages. In stage one, DDSR [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: t-SNE visualizations of target features for D [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Training convergence and stability. Accuracy curves on [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Effect of the subnetwork ratio γ on the Ar→Rw task of Office-Home and the D→A task of Office-31. The accuracy reaches its peak when γ = 0.84, while remaining stable across different values, showing the robustness of the method. TABLE IV: Accuracy (%) under different values of the threshold ∆˜ GU on several tasks of Office-31 and Office-Home. The best performance is consistently achieved when ∆˜ GU = 0.05.… view at source ↗

**Figure 7.** Figure 7: Prediction entropy of CLIP vs. the source model on [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Accuracy with different CLIP weights (0.2, 0.4, 0.6, [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

Assuming that neither source data nor source model parameters are accessible, black-box domain adaptation (BBDA) represents a highly practical yet challenging setting, where transferable knowledge is limited to the predictions of a black-box source model. Existing approaches exploit such knowledge via pseudo-label refinement or by leveraging vision-language models (ViLs), but they often fail to reconcile the inherent discrepancy between task-specific knowledge from black-box models and language-aligned semantic priors of ViLs, resulting in suboptimal integration and degraded adaptation performance. To address this challenge, we propose adaptive Dual-Teacher Distillation with Subnetwork Rectification (DDSR), a framework that explicitly reconciles these complementary yet inconsistent knowledge sources. DDSR employs an adaptive prediction fusion strategy to integrate predictions from the black-box source model and a ViL, generating reliable pseudo-labels for the target domain. A subnetwork-based regularization mechanism mitigates overfitting to noisy supervision by enforcing output consistency and gradient divergency. Furthermore, progressively improved target predictions iteratively refine both pseudo-labels and ViL prompts, enhancing semantic alignment. Finally, class-wise prototypes are used to further optimize target predictions via self-training. Extensive experiments on multiple benchmark datasets demonstrate that DDSR consistently outperforms state-of-the-art methods, including those with access to source data or source model parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DDSR fuses black-box predictions with ViL priors via adaptive fusion and subnetwork regularization for black-box DA, with experiments claiming gains even over source-access baselines.

read the letter

The main point is that this paper introduces DDSR to handle black-box domain adaptation by adaptively fusing outputs from an inaccessible source model and a vision-language model, then using subnetwork regularization to enforce consistency and gradient divergency while iteratively refining prompts and pseudo-labels. It ends with class-wise prototype self-training. The experiments are said to show consistent outperformance on benchmarks, including against methods that have source access or parameters. This combination targets the semantic mismatch that single-teacher or pure ViL methods leave unaddressed. The approach is new in how it treats the two knowledge sources as complementary but inconsistent and builds explicit reconciliation steps around them rather than simple averaging or refinement. It does a solid job framing why prior BBDA work degrades when task-specific predictions clash with language-aligned priors, and the subnetwork mechanism offers a concrete way to limit overfitting to noisy pseudo-labels. The iterative loop that updates both labels and prompts is a practical addition that could improve alignment over rounds. The soft spot is the assumption that the fusion and regularization steps reconcile discrepancies without injecting systematic target-domain bias. If ViL priors are weak on certain classes, the consistency and divergency terms could pull predictions toward a compromised middle rather than the true distribution. The abstract highlights extensive experiments, but the strength of the superiority claims rests on whether the results include ablations that isolate the fusion component, report variance across runs, and control for dataset-specific effects. Without those details holding up, the gains over source-access methods could be narrower than stated. This paper is for researchers working on practical domain adaptation in computer vision where privacy rules out source data sharing. Readers focused on ViL-assisted transfer or black-box constraints would find the pipeline details worth examining. I would send it for peer review because the setting is relevant, the method components are clearly defined, and the claims are concrete enough for referees to evaluate against the reported benchmarks.

Referee Report

2 major / 2 minor

Summary. The paper proposes Adaptive Dual-Teacher Distillation with Subnetwork Rectification (DDSR) for black-box domain adaptation (BBDA), where only source model predictions are accessible. DDSR integrates predictions from a black-box source model and a vision-language model (ViL) via adaptive fusion to generate pseudo-labels, applies subnetwork regularization to enforce output consistency and gradient divergency, uses iterative prompt refinement and class-wise prototypes for self-training, and claims to outperform state-of-the-art methods (including source-access methods) on multiple benchmarks by reconciling task-specific and semantic priors.

Significance. If the empirical claims hold with rigorous validation, the work would advance practical BBDA under privacy constraints by providing a concrete mechanism to fuse inconsistent knowledge sources without source data or parameters. The subnetwork rectification and iterative refinement components offer a novel angle on bias mitigation in dual-teacher setups.

major comments (2)

[Abstract] Abstract: the superiority claim ('consistently outperforms state-of-the-art methods, including those with access to source data') is unsupported by any quantitative results, error bars, dataset names, or ablation tables in the provided text, rendering the central contribution unverifiable.
[Method/Experiments] Method and Experiments sections: the assumption that adaptive fusion plus gradient-divergency regularization reconciles ViL misalignment without injecting systematic target-domain bias lacks a quantitative bound or isolated ablation of the fusion step; if initial discrepancy is large, the consistency enforcement may regularize toward a compromised distribution rather than true semantics (see stress-test concern).

minor comments (2)

[§3] Notation for the adaptive fusion weights, subnetwork selection, and gradient-divergency term should be introduced with explicit equations early in §3 to improve readability.
[Related Work] Missing references to recent BBDA baselines that also use ViL priors (e.g., post-2023 works) should be added for completeness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address each major comment point-by-point below, providing clarifications based on the full manuscript content.

read point-by-point responses

Referee: [Abstract] Abstract: the superiority claim ('consistently outperforms state-of-the-art methods, including those with access to source data') is unsupported by any quantitative results, error bars, dataset names, or ablation tables in the provided text, rendering the central contribution unverifiable.

Authors: The abstract serves as a high-level summary per standard conventions, with detailed evidence reserved for the body. The full manuscript supports the claim in Section 4: Tables 1-3 report mean accuracies and standard deviations (error bars from 3-5 runs) on Office-31, Office-Home, and VisDA-2017, explicitly comparing DDSR to both black-box and source-access baselines and showing consistent gains. Table 4 provides ablations. We can revise the abstract to name the datasets and reference the tables for added clarity. revision: partial
Referee: [Method/Experiments] Method and Experiments sections: the assumption that adaptive fusion plus gradient-divergency regularization reconciles ViL misalignment without injecting systematic target-domain bias lacks a quantitative bound or isolated ablation of the fusion step; if initial discrepancy is large, the consistency enforcement may regularize toward a compromised distribution rather than true semantics (see stress-test concern).

Authors: We provide an isolated ablation of the adaptive fusion step in Section 4.4 and Table 5, demonstrating clear gains over single-teacher baselines. The gradient-divergency term in subnetwork rectification is shown via consistency metrics and visualizations to avoid collapse to compromised distributions. No theoretical quantitative bound is derived, as the black-box dual-teacher setting makes closed-form analysis intractable, but empirical results across datasets with varying discrepancies support the absence of systematic bias. We will add a dedicated stress-test experiment in revision to simulate large initial ViL misalignment and quantify any bias. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external models and empirical validation

full rationale

The paper describes DDSR via adaptive fusion of black-box predictions with ViL priors, subnetwork regularization for consistency, iterative prompt refinement, and prototype self-training. No equations or procedures are presented that reduce any claimed prediction, pseudo-label, or performance gain to a quantity defined by the method's own fitted parameters or prior outputs. The central claims rest on integration of independent external components (black-box source model and ViL) plus benchmark experiments, with no self-definitional loops, fitted-input predictions, or load-bearing self-citations that collapse the result to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text. Any parameters (e.g., fusion coefficients) are inferred as typical training hyperparameters rather than load-bearing inventions.

axioms (1)

domain assumption Vision-language models supply complementary language-aligned semantic priors that can be fused with black-box task predictions
Invoked in the abstract as the basis for bridging semantic gaps.

pith-pipeline@v0.9.0 · 5551 in / 1257 out tokens · 66638 ms · 2026-05-15T00:58:07.040914+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

adaptive prediction fusion strategy ... based on uncertainty estimates ... Shannon entropy ... IU and GU
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

subnetwork-based regularization ... Jensen–Shannon divergence ... gradient divergency

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 3 internal anchors

[1]

Transfer adaptation learning: A decade survey,

L. Zhang and X. Gao, “Transfer adaptation learning: A decade survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 1, pp. 23–44, 2022. I

work page 2022
[2]

Domain-invariant feature enhancement domain adaptation for cross-scene road damage detection,

J. Li, Z. Qu, and X. Yin, “Domain-invariant feature enhancement domain adaptation for cross-scene road damage detection,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 36, no. 3, pp. 3466– 3480, 2026. I

work page 2026
[3]

Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,

J. Liang, D. Hu, and J. Feng, “Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,” inInternational conference on machine learning. PMLR, 2020, pp. 6028–6039. I, II-A, III-B1, I, II, III, IV-B

work page 2020
[4]

Adversarial source generation for source-free domain adaptation,

C. Cui, F. Meng, C. Zhang, Z. Liu, L. Zhu, S. Gong, and X. Lin, “Adversarial source generation for source-free domain adaptation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 6, pp. 4887–4898, 2024. I

work page 2024
[5]

Domain-division based progressive learning for source-free domain adaptation,

P. Liu, J. Li, M. Zhao, W. Xue, Q. Hu, and S. Chen, “Domain-division based progressive learning for source-free domain adaptation,”IEEE Transactions on Multimedia, vol. 27, pp. 7081–7092, 2025. I, II-A

work page 2025
[6]

Generative adversarial nets,

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2, 2014, pp. 2672–2680. I

work page 2014
[7]

Dine: Domain adaptation from single and multiple black-box predictors,

J. Liang, D. Hu, J. Feng, and R. He, “Dine: Domain adaptation from single and multiple black-box predictors,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8003–8013. I, II-A, II-B, I, II, III, IV-B, IV-C, IV-C

work page 2022
[8]

GPT-4 Technical Report

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. I

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015. I, III-B1

work page internal anchor Pith review Pith/arXiv arXiv 2015
[10]

Unsupervised domain adaptation of black-box source models,

H. Zhang, Y . Zhang, K. Jia, and L. Zhang, “Unsupervised domain adaptation of black-box source models,” in32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22-25, 2021. BMV A Press, 2021, p. 147. I

work page 2021
[11]

RAIN: regularization on input and network for black-box domain adaptation,

Q. Peng, Z. Ding, L. Lyu, L. Sun, and C. Chen, “RAIN: regularization on input and network for black-box domain adaptation,” inProceed- ings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. ijcai.org, 2023, pp. 4118–4126. I, II-B, III-B2, III-B2, I, II, III, IV-B

work page 2023
[12]

Adversarial experts model for black-box domain adaptation,

S. Xiao, M. Ye, Q. He, S. Li, S. Tang, and X. Zhu, “Adversarial experts model for black-box domain adaptation,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 8982–8991. I, II-B, II-B, II-B, II-B, III-B3, I, II, III, IV-B, IV-C, IV-C, IV-C 12

work page 2024
[13]

Clip-guided black-box domain adaptation of image classification,

L. Tian, M. Ye, L. Zhou, and Q. He, “Clip-guided black-box domain adaptation of image classification,”Signal, Image and Video Processing, vol. 18, no. 5, pp. 4637–4646, 2024. I, II-B, II-B, I, II, III, IV-B, IV-C, IV-C

work page 2024
[14]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PMLR, 2021, pp. 8748–8763. I, II-B, II-C, III-B1

work page 2021
[15]

Learning transferable features with deep adaptation networks,

M. Long, Y . Cao, J. Wang, and M. Jordan, “Learning transferable features with deep adaptation networks,” inInternational conference on machine learning. PMLR, 2015, pp. 97–105. II-A

work page 2015
[16]

Unsupervised domain adaptation with residual transfer networks,

M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Unsupervised domain adaptation with residual transfer networks,”Advances in neural infor- mation processing systems, vol. 29, 2016. II-A

work page 2016
[17]

Domain-adversarial training of neural networks,

Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavi- olette, M. March, and V . Lempitsky, “Domain-adversarial training of neural networks,”Journal of machine learning research, vol. 17, no. 59, pp. 1–35, 2016. II-A

work page 2016
[18]

Domain prompt tuning via meta relabeling for unsupervised adversarial adaptation,

X. Jin, C. Lan, W. Zeng, and Z. Chen, “Domain prompt tuning via meta relabeling for unsupervised adversarial adaptation,”IEEE Transactions on Multimedia, vol. 26, pp. 8333–8347, 2024. II-A

work page 2024
[19]

Wdan: A weighted discriminative adversarial network with dual classifiers for fine-grained open-set do- main adaptation,

J. Li, L. Yang, Q. Wang, and Q. Hu, “Wdan: A weighted discriminative adversarial network with dual classifiers for fine-grained open-set do- main adaptation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 9, pp. 5133–5147, 2023. II-A

work page 2023
[20]

Textadapter: Self-supervised domain adaptation for cross-domain text recognition,

X.-Q. Liu, P.-F. Zhang, X. Luo, Z. Huang, and X.-S. Xu, “Textadapter: Self-supervised domain adaptation for cross-domain text recognition,” IEEE Transactions on Multimedia, vol. 26, pp. 9854–9865, 2024. II-A

work page 2024
[21]

Self-ensembling for visual domain adaptation,

G. French, M. Mackiewicz, and M. Fisher, “Self-ensembling for visual domain adaptation,” inInternational Conference on Learning Represen- tations, 2018. II-A

work page 2018
[22]

Contrastive adaptation network for unsupervised domain adaptation,

G. Kang, L. Jiang, Y . Yang, and A. G. Hauptmann, “Contrastive adaptation network for unsupervised domain adaptation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4893–4902. II-A

work page 2019
[23]

Independent feature decomposition and instance alignment for unsupervised domain adaptation,

Q. He, S. Xiao, M. Ye, X. Zhu, F. Neri, and D. Hou, “Independent feature decomposition and instance alignment for unsupervised domain adaptation,” inProceedings of the thirty-second international joint conference on artificial intelligence, 2023, pp. 819–827. II-A

work page 2023
[24]

Enhancing multi-source open-set domain adaptation through nearest neighbor classification with self-supervised vision transformer,

J. Li, L. Yang, and Q. Hu, “Enhancing multi-source open-set domain adaptation through nearest neighbor classification with self-supervised vision transformer,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2648–2662, 2024. II-A

work page 2024
[25]

Progressive curriculum learning with teacher-student collaboration for source-free unsupervised domain adaptation,

Q. Tian, J. Shen, L. Kang, W. Ou, J. Wan, and Z. Lei, “Progressive curriculum learning with teacher-student collaboration for source-free unsupervised domain adaptation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 36, no. 2, pp. 1627–1639, 2026. II-A

work page 2026
[26]

Consistency regularization for generalizable source-free domain adaptation,

L. Tang, K. Li, C. He, Y . Zhang, and X. Li, “Consistency regularization for generalizable source-free domain adaptation,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, October 2023, pp. 4323–4333. II-A

work page 2023
[27]

Source-free domain adaptation with class prototype discovery,

L. Zhou, N. Li, M. Ye, X. Zhu, and S. Tang, “Source-free domain adaptation with class prototype discovery,”Pattern Recognition, vol. 145, p. 109974, 2024. II-A

work page 2024
[28]

Source- free domain adaptation via avatar prototype generation and adaptation,

Z. Qiu, Y . Zhang, H. Lin, S. Niu, Y . Liu, Q. Du, and M. Tan, “Source- free domain adaptation via avatar prototype generation and adaptation,” inInternational Joint Conference on Artificial Intelligence, 2021. II-A

work page 2021
[29]

Source-free domain adaptation guided by vision and vision-language pre-training,

W. Zhang, L. Shen, and C.-S. Foo, “Source-free domain adaptation guided by vision and vision-language pre-training,”International Jour- nal of Computer Vision, vol. 133, no. 2, pp. 844–866, 2025. II-A, I, II, III, IV-B

work page 2025
[30]

Proxy denoising for source-free domain adaptation,

S. Tang, W. Su, Y . Gan, M. Ye, J. D. Zhang, and X. Zhu, “Proxy denoising for source-free domain adaptation,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=FIj9IEPCKr II-A, I, II, III, IV-B

work page 2025
[31]

Reviewing the forgotten classes for domain adaptation of black-box predictors,

S. Zhang, C. Shen, S. L ¨u, and Z. Zhang, “Reviewing the forgotten classes for domain adaptation of black-box predictors,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 15, 2024, pp. 16 830–16 837. II-B, I, II, III, IV-B

work page 2024
[32]

A separation and alignment framework for black-box domain adaptation,

M. Xia, J. Zhao, G. Lyu, Z. Huang, T. Hu, G. Chen, and H. Wang, “A separation and alignment framework for black-box domain adaptation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 14, 2024, pp. 16 005–16 013. II-B, I, II, III, IV-B

work page 2024
[33]

Black-box unsupervised domain adaptation with bi-directional atkinson-shiffrin memory,

J. Zhang, J. Huang, X. Jiang, and S. Lu, “Black-box unsupervised domain adaptation with bi-directional atkinson-shiffrin memory,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11 771–11 782. II-B

work page 2023
[34]

Vision-language models for vision tasks: A survey,

J. Zhang, J. Huang, S. Jin, and S. Lu, “Vision-language models for vision tasks: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5625–5644, 2024. II-B

work page 2024
[35]

Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,

J. Li, D. Li, C. Xiong, and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” inInternational conference on machine learning. PMLR, 2022, pp. 12 888–12 900. II-C

work page 2022
[36]

Reproducible scaling laws for contrastive language-image learning,

M. Cherti, R. Beaumont, R. Wightman, M. Wortsman, G. Ilharco, C. Gordon, C. Schuhmann, L. Schmidt, and J. Jitsev, “Reproducible scaling laws for contrastive language-image learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 2818–2829. II-C

work page 2023
[37]

Scaling up visual and vision-language representation learning with noisy text supervision,

C. Jia, Y . Yang, Y . Xia, Y .-T. Chen, Z. Parekh, H. Pham, Q. Le, Y .-H. Sung, Z. Li, and T. Duerig, “Scaling up visual and vision-language representation learning with noisy text supervision,” inInternational conference on machine learning. PMLR, 2021, pp. 4904–4916. II-C

work page 2021
[38]

A new data augmentation method based on mixup and dempster-shafer theory,

Z. Zhang, H. Wang, J. Geng, X. Deng, and W. Jiang, “A new data augmentation method based on mixup and dempster-shafer theory,” IEEE Transactions on Multimedia, vol. 26, pp. 4998–5013, 2024. III-B1

work page 2024
[39]

Temporal ensembling for semi-supervised learn- ing,

S. Laine and T. Aila, “Temporal ensembling for semi-supervised learn- ing,” inInternational Conference on Learning Representations, 2017. III-B3, IV-E

work page 2017
[40]

Clip-adapter: Better vision-language models with feature adapters,

P. Gao, S. Geng, R. Zhang, T. Ma, R. Fang, Y . Zhang, H. Li, and Y . Qiao, “Clip-adapter: Better vision-language models with feature adapters,” International journal of computer vision, vol. 132, no. 2, pp. 581–595,

work page
[41]

Learning to prompt for vision- language models,

K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision- language models,”International Journal of Computer Vision, vol. 130, no. 9, pp. 2337–2348, 2022. III-B3

work page 2022
[42]

Deep clustering for unsupervised learning of visual features,

M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” inProceedings of the European conference on computer vision, 2018, pp. 132–149. III-C

work page 2018
[43]

Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer,

J. Liang, D. Hu, Y . Wang, R. He, and J. Feng, “Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 44, no. 11, pp. 8602–8617, 2021. III-C

work page 2021
[44]

Maximum classifier discrepancy for unsupervised domain adaptation,

K. Saito, K. Watanabe, Y . Ushiku, and T. Harada, “Maximum classifier discrepancy for unsupervised domain adaptation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3723–3732. I, II, III, IV-B

work page 2018
[45]

Homeomor- phism alignment for unsupervised domain adaptation,

L. Zhou, M. Ye, X. Zhu, S. Xiao, X.-Q. Fan, and F. Neri, “Homeomor- phism alignment for unsupervised domain adaptation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 18 699–18 710. I, II, III, IV-B

work page 2023
[46]

Domain adaptation via prompt learning,

C. Ge, R. Huang, M. Xie, Z. Lai, S. Song, S. Li, and G. Huang, “Domain adaptation via prompt learning,”IEEE Transactions on Neural Networks and Learning Systems, 2023. I, II, III, IV-B

work page 2023
[47]

Prompt-based distribution alignment for unsupervised domain adapta- tion,

S. Bai, M. Zhang, W. Zhou, S. Huang, Z. Luan, D. Wang, and B. Chen, “Prompt-based distribution alignment for unsupervised domain adapta- tion,” inProceedings of the AAAI conference on artificial intelligence, vol. 38, no. 2, 2024, pp. 729–737. I, II, III, IV-B

work page 2024
[48]

Bridging domain spaces for unsupervised domain adaptation,

J. Na, H. Jung, H. J. Chang, and W. Hwang, “Bridging domain spaces for unsupervised domain adaptation,”Pattern Recognition, vol. 164, p. 111537, 2025. I, II, III, IV-B

work page 2025
[49]

Attracting and dispersing: A simple approach for source-free domain adaptation,

S. Yang, S. Jui, J. Van De Weijeret al., “Attracting and dispersing: A simple approach for source-free domain adaptation,”Advances in Neural Information Processing Systems, vol. 35, pp. 5802–5815, 2022. I, II, III, IV-B

work page 2022
[50]

Source- free domain adaptation via target prediction distribution searching,

S. Tang, A. Chang, F. Zhang, X. Zhu, M. Ye, and C. Zhang, “Source- free domain adaptation via target prediction distribution searching,” International journal of computer vision, vol. 132, no. 3, pp. 654–672,

work page
[51]

Dual transferable knowledge interaction for source-free domain adaptation,

M. Zhan, Z. Wu, J. Yang, L. Peng, J. Shen, and X. Zhu, “Dual transferable knowledge interaction for source-free domain adaptation,” Information Processing & Management, vol. 63, no. 1, p. 104302, 2026. I, II, III, IV-B

work page 2026
[52]

Leveraging multi-level regularization for efficient domain adaptation of black-box predictors,

W. Li, W. Zhao, X. Pan, P. Zhou, and H. Yang, “Leveraging multi-level regularization for efficient domain adaptation of black-box predictors,” Pattern Recognition, vol. 165, p. 111611, 2025. I, II, III, IV-B

work page 2025
[53]

Learning like a real student: Black-box domain adaptation with preview, differentiated learning and review,

Q. Tian, Z. Liu, and W. Ou, “Learning like a real student: Black-box domain adaptation with preview, differentiated learning and review,” Image and Vision Computing, p. 105806, 2025. I, II, III, IV-B 13

work page 2025
[54]

Adapting visual category models to new domains,

K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” inEuropean conference on computer vision. Springer, 2010, pp. 213–226. IV-A

work page 2010
[55]

Deep hashing network for unsupervised domain adaptation,

H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” inPro- ceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5018–5027. IV-A

work page 2017
[56]

VisDA: The Visual Domain Adaptation Challenge

X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko, “Visda: The visual domain adaptation challenge,”arXiv preprint arXiv:1710.06924, 2017. IV-A

work page internal anchor Pith review Pith/arXiv arXiv 2017
[57]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. IV-C

work page 2016
[58]

Visualizing data using t-sne,

L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,”Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008. IV-E

work page 2008