pith. sign in

arxiv: 2604.11195 · v1 · submitted 2026-04-13 · 💻 cs.CV · cs.AI

Towards Adaptive Open-Set Object Detection via Category-Level Collaboration Knowledge Mining

Pith reviewed 2026-05-10 15:55 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords adaptive open-set object detectioncategory-level knowledge miningdomain adaptationobject detectionunsupervised clusteringmemory banknovel category adaptation
0
0 comments X

The pith

A clustering-based memory bank mines category knowledge from source features to adapt detectors to novel categories without target labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method for adaptive open-set object detection where a model is trained only on labeled base categories in a source domain and must then detect both base and entirely new categories in a target domain with no annotations available. It builds and maintains a memory bank by running unsupervised clustering on source features to capture class prototypes along with intra-class variations, then selects source features relevant to novel categories to initialize their classifiers. An adaptive assignment step moves this category-level knowledge to the target domain while the memory bank updates asynchronously to reduce source bias. A sympathetic reader would care because reliable adaptation without target labels would let detection systems handle emerging object types in new environments using only existing source data.

Core claim

By constructing a clustering-based memory bank to encode class prototypes, auxiliary features, and intra-class disparity information from the source domain, iteratively updating it via unsupervised clustering, applying a base-to-novel selection metric to initialize novel classifiers, and using adaptive feature assignment with asynchronous updates, the method transfers category-level knowledge to improve detection of both known and novel categories in the target domain.

What carries the argument

The clustering-based memory bank, which encodes class prototypes, auxiliary features, and intra-class disparity information and is updated iteratively by unsupervised clustering on source features and asynchronously for target-domain transfer.

If this is right

  • The method surpasses prior adaptive open-set object detection approaches by 1.1-5.5 mAP across multiple benchmarks.
  • Asynchronous memory bank updates reduce source-domain feature bias during target adaptation.
  • A base-to-novel selection metric identifies source features useful for initializing novel-category classifiers.
  • Inter-class and intra-class relationships mined from the source domain strengthen cross-domain representations for both base and novel categories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same memory-bank construction could be tested on unsupervised domain adaptation for tasks such as semantic segmentation or image classification.
  • If the clustering step proves robust, the approach might support zero-shot or few-shot extensions where even fewer source examples are available.
  • Applying the technique to video object detection could add temporal consistency constraints to the memory bank updates.

Load-bearing premise

Unsupervised clustering on source features can reliably encode class prototypes and intra-class disparity information that transfers effectively to novel categories in the target domain without any target annotations.

What would settle it

If the method shows no mAP gain or a performance drop for novel categories on a benchmark where source and target domains have large visual distribution shifts, the transferability of the mined knowledge would be refuted.

Figures

Figures reproduced from arXiv: 2604.11195 by Junjie Ke, Lihuo He, Lizhi Wang, Xinbo Gao, Yuqi Ji.

Figure 1
Figure 1. Figure 1: Illustration of (a) existing DAOD task, (b) OSOD task and (c) AOOD [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Conceptual Visualization of the gap between mean features of base [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An overview of the proposed method. In each mini-batch, images from the source domain (with ground truth annotations) and the target domain are [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Update procedure of the clustering-based memory bank (CMB). (a) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of the feature distributions for base and novel categories [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The boxplot for Pascal VOC → CLipart, where the blue dots • and the red stars ⋆ indicate the performance of the baseline and CCKM, respectively. is reduced to 0.746, and AOSE drops to 3570, underscoring its effectiveness in handling infrequent novel instances while maintaining base class precision. In the freq-inc setting, frequent novel occurrences inten￾sify novel–background ambiguity, leading prior meth… view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity analysis of the hyperparameter [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: t-SNE visualization of object query features on Cityscapes [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of detection results in Cityscapes [36] [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
read the original abstract

Existing object detectors often struggle to generalize across domains while adapting to emerging novel categories. Adaptive open-set object detection (AOOD) addresses this challenge by training on base categories in the source domain and adapting to both base and novel categories in the target domain without target annotations. However, current AOOD methods remain limited by weak cross-domain representations, ambiguity among novel categories, and source-domain feature bias. To address these issues, we propose a category-level collaboration knowledge mining strategy that exploits both inter-class and intra-class relationships across domains. Specifically, we construct a clustering-based memory bank to encode class prototypes, auxiliary features, and intra-class disparity information, and iteratively update it via unsupervised clustering to enhance category-level knowledge representation. We further design a base-to-novel selection metric to discover source-domain features related to novel categories and use them to initialize novel-category classifiers. In addition, an adaptive feature assignment strategy transfers the learned category-level knowledge to the target domain and asynchronously updates the memory bank to alleviate source-domain bias. Extensive experiments on multiple benchmarks show that our method consistently surpasses state-of-the-art AOOD methods by 1.1-5.5 mAP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a category-level collaboration knowledge mining strategy for adaptive open-set object detection (AOOD). It constructs a clustering-based memory bank from source-domain features to encode class prototypes, auxiliary features, and intra-class disparity information, iteratively updated via unsupervised clustering. A base-to-novel selection metric identifies source features related to novel categories to initialize novel classifiers. An adaptive feature assignment transfers the knowledge to the target domain (without annotations) while asynchronously updating the memory bank to reduce source bias. Extensive experiments on multiple benchmarks are reported to show consistent 1.1-5.5 mAP gains over prior AOOD methods.

Significance. If the reported gains prove robust, the work would advance AOOD by demonstrating how unsupervised source-only clustering and adaptive cross-domain assignment can mitigate weak representations, novel-category ambiguity, and source bias without target labels. The iterative memory-bank update and base-to-novel selection are concrete mechanisms that could be useful if they reliably encode transferable prototypes. The absence of target supervision makes the approach ambitious, but the empirical claims require strong validation of the clustering transfer step.

major comments (2)
  1. [§3] §3 (Method, clustering-based memory bank and base-to-novel selection): the central performance claim (1.1-5.5 mAP gains) rests on the assumption that unsupervised clustering of source features alone can produce reliable prototypes and intra-class disparity information that transfers to unseen novel categories under domain shift. No ablation or analysis is shown demonstrating that the selection metric identifies features genuinely related to novel classes rather than spurious correlations; this is load-bearing because novel classes are definitionally absent from source labels.
  2. [Experimental results] Experimental results section (tables reporting mAP): the abstract and results claim consistent outperformance, yet no details are provided on statistical significance testing, number of random seeds/runs, or controls for confounding factors such as hyper-parameter sensitivity in the clustering step. This weakens confidence that the gains are attributable to the proposed knowledge-mining components rather than implementation details.
minor comments (2)
  1. The abstract and method overview would benefit from explicit naming of the benchmarks and datasets used, as well as a brief statement of the open-set protocol (e.g., which categories are base vs. novel).
  2. Notation for the memory bank contents (prototypes, auxiliary features, disparity) could be formalized with a small equation or diagram to improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the positive evaluation of our work's potential impact on adaptive open-set object detection. We address each major comment below and describe the planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Method, clustering-based memory bank and base-to-novel selection): the central performance claim (1.1-5.5 mAP gains) rests on the assumption that unsupervised clustering of source features alone can produce reliable prototypes and intra-class disparity information that transfers to unseen novel categories under domain shift. No ablation or analysis is shown demonstrating that the selection metric identifies features genuinely related to novel classes rather than spurious correlations; this is load-bearing because novel classes are definitionally absent from source labels.

    Authors: We agree that additional evidence is needed to substantiate the transfer of unsupervised prototypes and intra-class disparity encodings to novel categories under domain shift. The original manuscript relies on end-to-end benchmark gains to support the approach, but we acknowledge that direct validation of the base-to-novel selection metric is warranted. In the revised version, we will add targeted ablations and analyses, including quantitative similarity measures between selected source features and target-domain novel instances, as well as t-SNE visualizations of the memory bank prototypes before and after selection. These will help demonstrate that the metric captures transferable category-level relationships rather than spurious correlations. revision: partial

  2. Referee: [Experimental results] Experimental results section (tables reporting mAP): the abstract and results claim consistent outperformance, yet no details are provided on statistical significance testing, number of random seeds/runs, or controls for confounding factors such as hyper-parameter sensitivity in the clustering step. This weakens confidence that the gains are attributable to the proposed knowledge-mining components rather than implementation details.

    Authors: We appreciate this observation on experimental rigor. The current results report single-run mAP values across benchmarks. In the revised manuscript, we will update the experimental section to include performance averaged over multiple random seeds (mean ± standard deviation over three independent runs) and will add paired statistical significance tests against the strongest baselines. We will also include a sensitivity analysis for the clustering hyper-parameters (e.g., number of clusters and memory update frequency) to show that the reported gains remain stable across reasonable settings, thereby increasing confidence that improvements stem from the proposed category-level collaboration mechanisms. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a methodological pipeline of unsupervised clustering to build a memory bank, a base-to-novel selection metric, and adaptive asynchronous updates, but the provided text contains no equations, derivations, fitted-parameter predictions, or self-citation chains that reduce any claimed result to its own inputs by construction. Performance claims rest on experimental mAP gains rather than algebraic equivalence. This is the expected non-finding for an empirical method paper without load-bearing mathematical self-reference.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

With only the abstract available, specific free parameters such as clustering hyperparameters or selection thresholds are not detailed. The approach assumes standard unsupervised clustering works for category prototypes. No new entities invented beyond standard ML components.

free parameters (1)
  • clustering parameters
    Number of clusters or update rules likely tuned but not specified in abstract.
axioms (1)
  • domain assumption Unsupervised clustering can capture meaningful class prototypes and intra-class variations across domains
    Invoked in the construction and update of the memory bank.

pith-pipeline@v0.9.0 · 5511 in / 1170 out tokens · 51396 ms · 2026-05-10T15:55:03.615788+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages

  1. [1]

    Retrieval-augmented open- vocabulary object detection,

    J. Kim, E. Cho, S. Kim, and H. J. Kim, “Retrieval-augmented open- vocabulary object detection,” inProc. IEEE Comput. Vis. Pattern Recog- nit. (CVPR), 2024, pp. 17 427–17 436

  2. [2]

    Two- step strategy for domain adaptation retrieval,

    Y . Chen, X. Fang, Y . Liu, W. Zheng, P. Kang, N. Han, and S. Xie, “Two- step strategy for domain adaptation retrieval,”IEEE Trans. Knowl. Data Eng., vol. 36, no. 2, pp. 897–912, 2023

  3. [3]

    Pixelwise instance segmentation with a dynamically instantiated network,

    A. Arnab and P. H. Torr, “Pixelwise instance segmentation with a dynamically instantiated network,” inProc. IEEE Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 441–450

  4. [4]

    Yolact: Real-time instance segmentation,

    D. Bolya, C. Zhou, F. Xiao, and Y . J. Lee, “Yolact: Real-time instance segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 9157–9166

  5. [5]

    Centermask: Real-time anchor-free instance seg- mentation,

    Y . Lee and J. Park, “Centermask: Real-time anchor-free instance seg- mentation,” inProc. IEEE Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 13 906–13 915

  6. [6]

    Out-of-distribution semantic segmentation with disentangled and calibrated representation,

    M. Wan, K. Li, Q. Geng, B. Su, and Z. Zhou, “Out-of-distribution semantic segmentation with disentangled and calibrated representation,” IEEE Trans. Circuits Syst. Video Technol., 2025

  7. [7]

    Multi-view 3d object detection network for autonomous driving,

    X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3d object detection network for autonomous driving,” inProc. IEEE Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 1907–1915

  8. [8]

    3D- DFM: Anchor-free multimodal 3-d object detection with dynamic fusion module for autonomous driving,

    C. Lin, D. Tian, X. Duan, J. Zhou, D. Zhao, and D. Cao, “3D- DFM: Anchor-free multimodal 3-d object detection with dynamic fusion module for autonomous driving,”IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 12, pp. 10 812–10 822, 2023

  9. [9]

    Fsrdd: An efficient few-shot detector for rare city road damage detection,

    B. Su, H. Zhang, Z. Wu, and Z. Zhou, “Fsrdd: An efficient few-shot detector for rare city road damage detection,”IEEE Trans. Intell. Transp. Syst., vol. 23, no. 12, pp. 24 379–24 388, 2022

  10. [10]

    Pvel-ad: A large-scale open-world dataset for photovoltaic cell anomaly detection,

    B. Su, Z. Zhou, and H. Chen, “Pvel-ad: A large-scale open-world dataset for photovoltaic cell anomaly detection,”IEEE Trans. Ind. Informat., vol. 19, no. 1, pp. 404–413, 2022

  11. [11]

    Improving predictive inference under covariate shift by weighting the log-likelihood function,

    H. Shimodaira, “Improving predictive inference under covariate shift by weighting the log-likelihood function,”J. Stat. Plan. Inference, vol. 90, no. 2, pp. 227–244, 2000

  12. [12]

    Prototype- guided continual adaptation for class-incremental unsupervised domain adaptation,

    H. Lin, Y . Zhang, Z. Qiu, S. Niu, C. Gan, Y . Liu, and M. Tan, “Prototype- guided continual adaptation for class-incremental unsupervised domain adaptation,” inProc.Eur .Conf.Comput.Vis.(ECCV), 2022, pp. 351–368

  13. [13]

    Robust object detection via adversarial novel style exploration,

    W. Wang, J. Zhang, W. Zhai, Y . Cao, and D. Tao, “Robust object detection via adversarial novel style exploration,”IEEE Trans. Image Process., vol. 31, pp. 1949–1962, 2022

  14. [14]

    SCAN++: Enhanced semantic conditioned adaptation for domain adaptive object detection,

    W. Li, X. Liu, and Y . Yuan, “SCAN++: Enhanced semantic conditioned adaptation for domain adaptive object detection,”IEEE Trans. Multime- dia, vol. 25, pp. 7051–7061, 2023

  15. [15]

    SIGMA: Semantic-complete graph matching for domain adaptive object detection,

    ——, “SIGMA: Semantic-complete graph matching for domain adaptive object detection,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 5291–5300

  16. [16]

    The overlooked elephant of object detection: Open set,

    A. Dhamija, M. Gunther, J. Ventura, and T. Boult, “The overlooked elephant of object detection: Open set,” inProc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2020, pp. 1021–1030

  17. [17]

    Expanding low- density latent regions for open-set object detection,

    J. Han, Y . Ren, J. Ding, X. Pan, K. Yan, and G.-S. Xia, “Expanding low- density latent regions for open-set object detection,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 9591–9600

  18. [18]

    Toward generalized few-shot open-set object detection,

    B. Su, H. Zhang, J. Li, and Z. Zhou, “Toward generalized few-shot open-set object detection,”IEEE Trans. Image Process., vol. 33, pp. 1389–1402, 2024

  19. [19]

    Hsic-based moving weight averaging for few-shot open-set object detection,

    B. Su, H. Zhang, and Z. Zhou, “Hsic-based moving weight averaging for few-shot open-set object detection,” inProc. ACM Int. Conf. Multimedia (MM’23), 2023, pp. 5358–5369

  20. [20]

    Boosting few- shot open-set object detection via prompt learning and robust decision boundary,

    Z. Wu, B. Su, Q. Geng, H. Zhang, and Z. Zhou, “Boosting few- shot open-set object detection via prompt learning and robust decision boundary,”arXiv preprint arXiv:2406.18443, 2024

  21. [21]

    Unknown sniffer for object detection: Don’t turn a blind eye to unknown objects,

    W. Liang, F. Xue, Y . Liu, G. Zhong, and A. Ming, “Unknown sniffer for object detection: Don’t turn a blind eye to unknown objects,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 3230–3239

  22. [22]

    Novel scenes & classes: Towards adaptive open-set object detection,

    W. Li, X. Guo, and Y . Yuan, “Novel scenes & classes: Towards adaptive open-set object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023, pp. 15 780–15 790

  23. [23]

    Deformable DETR: Deformable transformers for end-to-end object detection,

    X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable transformers for end-to-end object detection,” inProc. Int. Conf. Learn. Represent. (ICLR), 2020

  24. [24]

    RPN prototype alignment for domain adaptive object detector,

    Y . Zhang, Z. Wang, and Y . Mao, “RPN prototype alignment for domain adaptive object detector,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 12 425–12 434. 14

  25. [25]

    Knowledge mining and transferring for domain adaptive object detection,

    K. Tian, C. Zhang, Y . Wang, S. Xiang, and C. Pan, “Knowledge mining and transferring for domain adaptive object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 9133–9142

  26. [26]

    Towards open- set object detection and discovery,

    J. Zheng, W. Li, J. Hong, L. Petersson, and N. Barnes, “Towards open- set object detection and discovery,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 3961–3970

  27. [27]

    Towards open world object detection,

    K. Joseph, S. Khan, F. S. Khan, and V . N. Balasubramanian, “Towards open world object detection,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 5830–5840

  28. [28]

    A prototype-oriented framework for unsupervised domain adaptation,

    K. Tanwisuth, X. Fan, H. Zheng, S. Zhang, H. Zhang, B. Chen, and M. Zhou, “A prototype-oriented framework for unsupervised domain adaptation,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2021, pp. 17 194–17 208

  29. [29]

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,

    P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,”J. Comput. Appl. Math., vol. 20, pp. 53– 65, 1987

  30. [30]

    Comparing clusterings by the variation of information,

    M. Meil ˘a, “Comparing clusterings by the variation of information,” inProc. Annu. Conf. Learn. Theory Kernel Workshop (COLT/Kernel), 2003, pp. 173–187

  31. [31]

    Exploiting the intrinsic neighborhood structure for source-free domain adaptation,

    S. Yang, J. Van de Weijer, L. Herranz, S. Juiet al., “Exploiting the intrinsic neighborhood structure for source-free domain adaptation,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2021, pp. 29 393– 29 405

  32. [32]

    Separate to adapt: Open set domain adaptation via progressive separation,

    H. Liu, Z. Cao, M. Long, J. Wang, and Q. Yang, “Separate to adapt: Open set domain adaptation via progressive separation,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 2927– 2936

  33. [33]

    MeGA- CDA: Memory guided attention for category-aware unsupervised domain adaptive object detection,

    V . Vs, V . Gupta, P. Oza, V . A. Sindagi, and V . M. Patel, “MeGA- CDA: Memory guided attention for category-aware unsupervised domain adaptive object detection,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 4516–4526

  34. [34]

    Cross-domain object detection through coarse-to-fine feature adaptation,

    Y . Zheng, D. Huang, S. Liu, and Y . Wang, “Cross-domain object detection through coarse-to-fine feature adaptation,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 13 766–13 775

  35. [35]

    BDD100k: A diverse driving dataset for heterogeneous multitask learning,

    F. Yu, H. Chen, X. Wang, W. Xian, Y . Chen, F. Liu, V . Madhavan, and T. Darrell, “BDD100k: A diverse driving dataset for heterogeneous multitask learning,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 2636–2645

  36. [36]

    The cityscapes dataset for semantic urban scene understanding,

    M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be- nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” inProc. IEEE Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 3213–3223

  37. [37]

    Semantic foggy scene under- standing with synthetic data,

    C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene under- standing with synthetic data,”Int. J. Comput. Vis., vol. 126, pp. 973–992, 2018

  38. [38]

    The pascal visual object classes challenge: A retrospective,

    M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,”Int. J. Comput. Vis., vol. 111, pp. 98–136, 2015

  39. [39]

    Cross-domain weakly-supervised object detection through progressive domain adapta- tion,

    N. Inoue, R. Furuta, T. Yamasaki, and K. Aizawa, “Cross-domain weakly-supervised object detection through progressive domain adapta- tion,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 5001–5009

  40. [40]

    Domain adaptive faster R-CNN for object detection in the wild,

    Y . Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive faster R-CNN for object detection in the wild,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 3339–3348

  41. [41]

    Multi-adversarial faster-rcnn for unrestricted object detection,

    Z. He and L. Zhang, “Multi-adversarial faster-rcnn for unrestricted object detection,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 6668–6677

  42. [42]

    Adapting object detectors via selective cross-domain alignment,

    X. Zhu, J. Pang, C. Yang, J. Shi, and D. Lin, “Adapting object detectors via selective cross-domain alignment,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 687–696

  43. [43]

    SIGMA++: Improved semantic-complete graph matching for domain adaptive object detection,

    W. Li, X. Liu, and Y . Yuan, “SIGMA++: Improved semantic-complete graph matching for domain adaptive object detection,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 7, pp. 9022–9040, 2023

  44. [44]

    Learning open set network with discriminative reciprocal points,

    G. Chen, L. Qiao, Y . Shi, P. Peng, J. Li, T. Huang, S. Pu, and Y . Tian, “Learning open set network with discriminative reciprocal points,” in Proc. Eur . Conf. Comput. Vis. (ECCV), 2020, pp. 507–522

  45. [45]

    Adversarial reciprocal points learning for open set recognition,

    G. Chen, P. Peng, X. Wang, and Y . Tian, “Adversarial reciprocal points learning for open set recognition,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 8065–8081, 2021

  46. [46]

    OW-DETR: Open-world detection transformer,

    A. Gupta, S. Narayan, K. Joseph, S. Khan, F. S. Khan, and M. Shah, “OW-DETR: Open-world detection transformer,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 9235–9244

  47. [47]

    Open set domain adaptation,

    P. Panareda Busto and J. Gall, “Open set domain adaptation,” inProc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 754–763

  48. [48]

    Open set domain adaptation by backpropagation,

    K. Saito, S. Yamamoto, Y . Ushiku, and T. Harada, “Open set domain adaptation by backpropagation,” inProc. Eur . Conf. Comput. Vis. (ECCV), 2018, pp. 156–171

  49. [49]

    On the effec- tiveness of image rotation for open set domain adaptation,

    S. Bucci, M. R. Loghmani, and T. Tommasi, “On the effec- tiveness of image rotation for open set domain adaptation,” in Proc.Eur .Conf.Comput.Vis.(ECCV), 2020, pp. 422–438

  50. [50]

    Balanced open set domain adaptation via centroid alignment,

    M. Jing, J. Li, L. Zhu, Z. Ding, K. Lu, and Y . Yang, “Balanced open set domain adaptation via centroid alignment,” inProc. AAAI Conf. Artif. Intell.(AAAI), 2021, pp. 8013–8020

  51. [51]

    Scene parsing with global context embedding,

    W. C. Hung, Y . H. Tsai, X. Shen, Z. Lin, K. Sunkavalli, X. Lu, and M. H. Yang, “Scene parsing with global context embedding,” inProc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 2631–2639

  52. [52]

    Higher-order organization of complex networks,

    A. R. Benson, D. F. Gleich, and J. Leskovec, “Higher-order organization of complex networks,”Science, vol. 353, no. 6295, pp. 163–166, 2016

  53. [53]

    Feature pyramid networks for object detection,

    T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProc. IEEE Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 2117–2125

  54. [54]

    The hungarian method for the assignment problem,

    H. W. Kuhn, “The hungarian method for the assignment problem,”Naval Res. Logist., vol. 2, no. 1-2, pp. 83–97, 1955

  55. [55]

    Generalized intersection over union: A metric and a loss for bounding box regression,

    H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 658–666

  56. [56]

    Focal loss for dense object detection,

    T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2017, pp. 2980–2988

  57. [57]

    Momentum contrast for unsupervised visual representation learning,

    K. He, H. Fan, Y . Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 9729–9738

  58. [58]

    Clustering with spectral norm and the k- means algorithm,

    A. Kumar and R. Kannan, “Clustering with spectral norm and the k- means algorithm,” inProc. Annu. IEEE Symp. F ound. Comput. Sci. (FOCS), 2010, pp. 299–308

  59. [59]

    MLFA: Towards realistic test time adaptive object detection by multi-level feature alignment,

    Y . Liu, J. Wang, C. Huang, Y . Wu, Y . Xu, and X. Cao, “MLFA: Towards realistic test time adaptive object detection by multi-level feature alignment,”IEEE Trans. Image Process., vol. 33, pp. 5837–5848, 2024

  60. [60]

    Towards open world recognition,

    A. Bendale and T. E. Boult, “Towards open world recognition,” inProc. IEEE Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 1893–1902

  61. [61]

    Toward open set recognition,

    W. J. Scheirer, A. de Rezende Rocha, A. Sapkota, and T. E. Boult, “Toward open set recognition,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 7, pp. 1757–1772, 2012

  62. [62]

    Learning placeholders for open- set recognition,

    D.-W. Zhou, H.-J. Ye, and D.-C. Zhan, “Learning placeholders for open- set recognition,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 4401–4410

  63. [63]

    Exploring sequence feature alignment for domain adaptive detection transformers,

    W. Wang, Y . Cao, J. Zhang, F. He, Z.-J. Zha, Y . Wen, and D. Tao, “Exploring sequence feature alignment for domain adaptive detection transformers,” inProc. ACM Int. Conf. Multimedia (MM’21), 2021, pp. 1730–1738

  64. [64]

    Vector-decomposed dis- entanglement for domain-invariant object detection,

    A. Wu, R. Liu, Y . Han, L. Zhu, and Y . Yang, “Vector-decomposed dis- entanglement for domain-invariant object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 9342–9351

  65. [65]

    I3NET: Implicit instance-invariant network for adapting one-stage object detectors,

    C. Chen, Z. Zheng, Y . Huang, X. Ding, and Y . Yu, “I3NET: Implicit instance-invariant network for adapting one-stage object detectors,” in Proc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 12 576–12 585

  66. [66]

    Strong-weak distri- bution alignment for adaptive object detection,

    K. Saito, Y . Ushiku, T. Harada, and K. Saenko, “Strong-weak distri- bution alignment for adaptive object detection,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 6956–6965

  67. [67]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 770–778

  68. [68]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inProc. IEEE/CVF Comput. Vis. Pattern Recognit. (CVPR), 2009, pp. 248–255

  69. [69]

    DINO: Detr with improved denoising anchor boxes for end-to- end object detection,

    H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. Ni, and H.-Y . Shum, “DINO: Detr with improved denoising anchor boxes for end-to- end object detection,” inProc. Int. Conf. Learn. Representations. (ICLR), 2022, pp. 1–8

  70. [70]

    Objects365: A large-scale, high-quality dataset for object detection,

    S. Shao, Z. Li, T. Zhang, C. Peng, G. Yu, X. Zhang, J. Li, and J. Sun, “Objects365: A large-scale, high-quality dataset for object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 8430–8439

  71. [71]

    Decoupled weight decay regularization,

    I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” inProc. Int. Conf. Learn. Representations. (ICLR), 2019. 15 Yuqi Jireceived the B.Sc. degree in Detection, Guidance and Control Technology in 2022 from Xi- dian University, Xi’an, China, where he is currently working toward the Ph.D. degree. His research in- terests include object detec...

  72. [72]

    Lihuo He(Member, IEEE) received the B.Sc

    His research interests focus on object detection and computer vision. Lihuo He(Member, IEEE) received the B.Sc. degree in electronic and information engineering and the Ph.D. degree in pattern recognition and intelligent systems from Xidian University, China, in 2008 and 2013, respectively. He is currently a Professor in the School of Electronic Engineeri...