pith. machine review for the scientific record. sign in

arxiv: 2604.27596 · v1 · submitted 2026-04-30 · 💻 cs.CV

Recognition: unknown

SECOS: Semantic Capture for Rigorous Classification in Open-World Semi-Supervised Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-07 08:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords open-world semi-supervised learningsemantic alignmentnovel class detectiontextual label predictioncross-modal supervisiondirect classification
0
0 comments X

The pith

SECOS uses external knowledge to align semantics across modalities so models can directly predict textual labels for novel classes in open-world semi-supervised learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses a practical gap in open-world semi-supervised learning where models must choose the best label directly from a candidate textual set for every sample, including those from unseen classes. Existing approaches leave novel-class samples without explicit supervision and produce outputs that lack semantic ties to the available labels, forcing reliance on separate post-processing steps that real applications cannot use. SECOS extracts semantic representations from external knowledge sources and aligns them with visual features for both known and novel classes, turning those alignments into direct training signals. Experiments show the resulting model selects correct labels without any post-hoc matching and still beats prior methods by as much as 5.4 percent under their more lenient evaluation protocol.

Core claim

SECOS extracts and aligns semantic representations for known and novel classes using external knowledge, supplying explicit supervisory signals that let the model directly output the most relevant textual label from a candidate set for every input without any post-processing stage.

What carries the argument

SECOS framework that extracts cross-modal semantic representations from external knowledge sources and uses the resulting alignments as explicit supervision for novel-class samples during training.

If this is right

  • Practical OWSSL systems can perform rigorous classification by selecting labels directly from candidate sets at inference time.
  • Novel classes receive explicit semantic supervision rather than relying solely on clustering or pseudo-labeling.
  • Performance gains of up to 5.4 percent are observed even when competing methods are allowed post-hoc label matching.
  • The need for separate post-processing modules is removed for both training and deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same alignment mechanism could be tested on open-world object detection or segmentation tasks where semantic consistency across modalities is also required.
  • Replacing the external knowledge source with a larger multimodal model might further tighten the alignment for rare novel classes.
  • If the external knowledge proves incomplete for certain domains, the performance gap between SECOS and post-processed baselines would shrink or reverse.

Load-bearing premise

External knowledge sources will supply semantic representations that reliably match the visual content of novel classes outside the training distribution.

What would settle it

A controlled test on a dataset whose novel classes have no usable semantic coverage in the chosen external knowledge base, measuring whether SECOS accuracy falls below that of post-processing baselines.

Figures

Figures reproduced from arXiv: 2604.27596 by Hezhao Liu, Jiacheng Yang, Junlong Gao, Mengke Li, Shreyank N Gowda, Yang Lu, Yiqun Zhang.

Figure 1
Figure 1. Figure 1: Illustration of the limitations of existing OWSSL meth view at source ↗
Figure 2
Figure 2. Figure 2: Novel Class Semantic Compensation. High-confidence view at source ↗
Figure 3
Figure 3. Figure 3: Batch-Wise Semantic Recapture. It captures the latent view at source ↗
Figure 4
Figure 4. Figure 4: Adapter for Semantic Feature Alignment. The adapter aligns visual features with semantic features of textual descriptions view at source ↗
Figure 5
Figure 5. Figure 5: Analysis of global novel pseudo-label (p-label) density. The left two subfigures show the variation of Known, Novel, and All view at source ↗
Figure 6
Figure 6. Figure 6: Effect of the adapter’s downsampling dimension on view at source ↗
Figure 7
Figure 7. Figure 7: Impact of batch size and pseudo-label selection on SECOS. The left two figures show that model performance remains stable view at source ↗
read the original abstract

In open-world semi-supervised learning (OWSSL), a model learns from labeled data and unlabeled data containing both known and novel classes. In practical OWSSL applications, models are expected to perform rigorous classification by directly selecting the most semantically relevant label from a candidate set for each sample. Existing OWSSL methods fail to achieve this because novel samples are trained without explicit supervision, and these methods lack mechanisms to extract latent semantic information, resulting in predicted labels that have no semantic correspondence to candidate textual labels. To address this, we introduce SEmantic Capture for Open-world Semi-supervised learning (SECOS), which directly predicts textual labels from the candidate set without post-processing, meeting the requirements of practical OWSSL applications. SECOS leverages external knowledge to extract and align semantic representations across modalities for both known and novel classes, providing explicit supervisory signals for training novel classes. Extensive experiments demonstrate that even when existing OWSSL methods are evaluated under the more lenient post-hoc matching setting, SECOS still surpasses them by up to 5.4\% without such assistance, highlighting its superior effectiveness. Code is available at https://github.com/ganchi-huanggua/OSSL-Classification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces SECOS for open-world semi-supervised learning (OWSSL). It claims that by using external knowledge sources to extract and align semantic representations across modalities for both known and novel classes, the method supplies explicit supervisory signals. This enables direct prediction of textual labels from a candidate set without any post-processing or post-hoc matching, unlike prior OWSSL approaches that lack mechanisms for latent semantic extraction. Experiments are reported to show that SECOS outperforms existing methods by up to 5.4% even when those baselines are evaluated under the more lenient post-hoc matching protocol.

Significance. If the central claims hold after proper validation, SECOS would address a practical gap in OWSSL by achieving rigorous, semantically grounded classification directly from candidate textual labels. Code availability is a positive factor for reproducibility. The approach's dependence on external knowledge for novel-class alignment, however, requires explicit testing of robustness to source quality and distributional mismatch before the reported gains can be considered generalizable.

major comments (3)
  1. [Experiments] Experiments section (and abstract): Performance gains of up to 5.4% are reported, yet no details are provided on experimental setup, error bars, data splits, number of runs, or the precise choice and preprocessing of external knowledge sources. Without these, the central claim of superiority under direct prediction cannot be verified or reproduced.
  2. [Method] Method section (description of semantic capture): The alignment of external knowledge representations for novel classes is treated as reliable and is used to generate explicit supervisory signals, but the manuscript contains no controlled ablation or degradation experiments on the external source (e.g., noisy embeddings, incomplete knowledge bases, or domain-shifted retrieval). This leaves the load-bearing assumption untested.
  3. [Method] §3 (or equivalent method description): The claim that SECOS 'directly predicts textual labels from the candidate set without post-processing' is central, yet the precise inference procedure, loss formulation for novel-class supervision, and how candidate-set selection is performed at test time are not specified with sufficient algorithmic detail or pseudocode to allow independent implementation.
minor comments (2)
  1. [Abstract] The abstract states that existing methods 'lack mechanisms to extract latent semantic information' but does not cite the specific prior works or sections where this limitation is demonstrated.
  2. [Method] Notation for semantic representations and alignment functions should be introduced earlier and used consistently; current description mixes 'external knowledge' with 'semantic representations' without clear definitions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript introducing SECOS for open-world semi-supervised learning. We address each major comment point by point below, providing clarifications and indicating revisions made to strengthen the paper.

read point-by-point responses
  1. Referee: [Experiments] Experiments section (and abstract): Performance gains of up to 5.4% are reported, yet no details are provided on experimental setup, error bars, data splits, number of runs, or the precise choice and preprocessing of external knowledge sources. Without these, the central claim of superiority under direct prediction cannot be verified or reproduced.

    Authors: We agree that the initial submission lacked sufficient experimental details for full reproducibility. In the revised manuscript, we have added a new subsection to the Experiments section that specifies the full setup: dataset splits (with exact proportions for labeled/unlabeled/novel classes), number of runs (5 independent runs with mean and standard deviation), error bars on all reported metrics, and the precise external knowledge sources (e.g., specific embedding models and Wikipedia-derived corpora) along with preprocessing steps such as normalization and filtering. These additions directly support verification of the 5.4% gains under the direct-prediction protocol. revision: yes

  2. Referee: [Method] Method section (description of semantic capture): The alignment of external knowledge representations for novel classes is treated as reliable and is used to generate explicit supervisory signals, but the manuscript contains no controlled ablation or degradation experiments on the external source (e.g., noisy embeddings, incomplete knowledge bases, or domain-shifted retrieval). This leaves the load-bearing assumption untested.

    Authors: The referee correctly notes the absence of explicit robustness ablations on the external knowledge component. While our primary experiments rely on standard, domain-aligned sources, we have incorporated new controlled ablation experiments in the revised manuscript. These include tests with injected noise in embeddings and reduced knowledge-base coverage, demonstrating that performance remains competitive. We provide a brief discussion on domain shift but acknowledge that exhaustive cross-domain retrieval experiments were not feasible within the current scope; the added results nonetheless address the core concern. revision: partial

  3. Referee: [Method] §3 (or equivalent method description): The claim that SECOS 'directly predicts textual labels from the candidate set without post-processing' is central, yet the precise inference procedure, loss formulation for novel-class supervision, and how candidate-set selection is performed at test time are not specified with sufficient algorithmic detail or pseudocode to allow independent implementation.

    Authors: We thank the referee for highlighting the need for greater implementation detail. The revised manuscript expands §3 with a step-by-step description of the inference procedure for direct textual label selection, the complete loss formulation for novel-class supervision (including the specific cross-modal alignment and contrastive terms), and the exact mechanism for candidate-set construction and selection at test time. We have also inserted pseudocode for the training loop and inference pipeline. The publicly released code at the provided GitHub link already implements these components, but the paper is now self-contained for independent reproduction. revision: yes

Circularity Check

0 steps flagged

No significant circularity: SECOS relies on independent external knowledge for semantic alignment rather than self-referential definitions or fitted predictions

full rationale

The paper's central mechanism extracts and aligns semantic representations using external knowledge sources to supply explicit supervisory signals for novel classes, enabling direct textual label prediction from a candidate set. This does not reduce to a self-definitional loop, a fitted input renamed as prediction, or a load-bearing self-citation chain. No equations or steps in the abstract or description equate the claimed output (rigorous classification without post-processing) to the model's own parameters by construction. The 5.4% gain is presented as an empirical result against baselines under different evaluation settings, not a tautological consequence of the method's definition. The derivation remains self-contained because the external knowledge bases and alignment process are treated as independent inputs, not derived from or equivalent to the SECOS training objective itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the availability and effectiveness of external knowledge sources for semantic alignment; no specific free parameters, axioms, or invented entities are named in the abstract.

pith-pipeline@v0.9.0 · 5526 in / 1166 out tokens · 36951 ms · 2026-05-07T08:40:33.810775+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 5 canonical work pages · 2 internal anchors

  1. [1]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. InNeurIPS, pages 1877–1901, 2020. 3

  2. [2]

    Open-world semi-supervised learning

    Kaidi Cao, Maria Brbic, and Jure Leskovec. Open-world semi-supervised learning. InICLR, 2022. 1, 2

  3. [3]

    Emerg- ing properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021. 4

  4. [4]

    Cgmatch: A different perspective of semi- supervised learning

    Bo Cheng, Jueqing Lu, Yuan Tian, Haifeng Zhao, Yi Chang, and Lan Du. Cgmatch: A different perspective of semi- supervised learning. InCVPR, pages 15381–15391, 2025. 1

  5. [5]

    Forensics adapter: Adapting clip for generalizable face forgery detection

    Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, and Junyu Dong. Forensics adapter: Adapting clip for generalizable face forgery detection. InCVPR, pages 19207–19217, 2025. 3

  6. [6]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 4

  7. [7]

    Learning textual prompts for open-world semi-supervised learning

    Yuxin Fan, Junbiao Cui, and Jiye Liang. Learning textual prompts for open-world semi-supervised learning. InCVPR, pages 14756–14765, 2025. 2, 5, 6

  8. [8]

    Adapter merging with centroid prototype mapping for scal- able class-incremental learning

    Takuma Fukuda, Hiroshi Kera, and Kazuhiko Kawamoto. Adapter merging with centroid prototype mapping for scal- able class-incremental learning. InCVPR, pages 4884–4893,

  9. [9]

    Robust semi-supervised learning when not all classes have labels

    Lan-Zhe Guo, Yi-Ge Zhang, Zhi-Fan Wu, Jie-Jing Shao, and Yu-Feng Li. Robust semi-supervised learning when not all classes have labels. InNeurIPS, pages 3305–3317, 2022. 2

  10. [10]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, pages 770–778, 2016. 2, 4

  11. [11]

    Parameter-efficient transfer learning for nlp

    Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InICML, pages 2790–2799, 2019. 2

  12. [12]

    3d object representations for fine-grained categorization

    Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. In ICCVW, pages 554–561, 2013. 5

  13. [13]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 5

  14. [14]

    Imagenet classification with deep convolutional neural net- works

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. InNeurIPS, 2012. 2, 5

  15. [15]

    Fate: A prompt- tuning-based semi-supervised learning framework for ex- tremely limited labeled data

    Hezhao Liu, Yang Lu, Mengke Li, Yiqun Zhang, Shreyank N Gowda, Chen Gong, and Hanzi Wang. Fate: A prompt- tuning-based semi-supervised learning framework for ex- tremely limited labeled data. InACM MM, page 3153–3162,

  16. [16]

    Open-world semi-supervised novel class discovery

    Jiaming Liu, Yangqiming Wang, Tongze Zhang, Yulu Fan, Qinli Yang, and Junming Shao. Open-world semi-supervised novel class discovery. InIJCAI, pages 4002–4010, 2023. 2

  17. [17]

    Mind the gap: Confidence discrepancy can guide federated semi-supervised learning across pseudo-mismatch

    Yijie Liu, Xinyi Shang, Yiqun Zhang, Yang Lu, Chen Gong, Jing-Hao Xue, and Hanzi Wang. Mind the gap: Confidence discrepancy can guide federated semi-supervised learning across pseudo-mismatch. InCVPR, pages 10173–10182,

  18. [18]

    Enhancing clip with clip: Exploring pseudolabeling for limited-label prompt tuning

    Cristina Menghini, Andrew Delworth, and Stephen Bach. Enhancing clip with clip: Exploring pseudolabeling for limited-label prompt tuning. InNeurIPS, pages 60984– 61007, 2023. 3

  19. [19]

    Automated flower classification over a large number of classes

    Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In2008 Sixth Indian conference on computer vision, graphics & im- age processing, pages 722–729, 2008. 5

  20. [20]

    Ow- match: Conditional self-labeling with consistency for open- world semi-supervised learning

    Shengjie Niu, Lifan Lin, Jian Huang, and Chao Wang. Ow- match: Conditional self-labeling with consistency for open- world semi-supervised learning. InNeurIPS, pages 99836– 99866, 2024. 2, 5, 6

  21. [21]

    Clip- gcd: Simple language guided generalized category discov- ery.arXiv preprint arXiv:2305.10420, 2023

    Rabah Ouldnoughi, Chia-Wen Kuo, and Zsolt Kira. Clip- gcd: Simple language guided generalized category discov- ery.arXiv preprint arXiv:2305.10420, 2023. 2

  22. [22]

    St-adapter: Parameter-efficient image-to-video transfer learning

    Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, and Hong- sheng Li. St-adapter: Parameter-efficient image-to-video transfer learning. InNeurIPS, pages 26462–26477, 2022. 3

  23. [23]

    Cats and dogs

    Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and dogs. InCVPR, pages 3498–3505,

  24. [24]

    Pytorch: An imperative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019. 5

  25. [25]

    AdapterFusion: Non-Destructive Task Composition for Transfer Learning , journal =

    Jonas Pfeiffer, Aishwarya Kamath, Andreas R ¨uckl´e, Kyunghyun Cho, and Iryna Gurevych. Adapterfusion: Non- destructive task composition for transfer learning.arXiv preprint arXiv:2005.00247, 2020. 3

  26. [26]

    Dynamic conceptional contrastive learning for generalized category discovery

    Nan Pu, Zhun Zhong, and Nicu Sebe. Dynamic conceptional contrastive learning for generalized category discovery. In CVPR, pages 7579–7588, 2023. 2, 5, 6

  27. [27]

    Improving language understanding by gen- erative pre-training

    Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by gen- erative pre-training. 2018. 3

  28. [28]

    Learn- ing transferable visual models from natural language super- vision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InICML, pages 8748–8763, 2021. 2, 4

  29. [29]

    Openldn: Learn- ing to discover novel classes for open-world semi-supervised learning

    Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fa- had Shahbaz Khan, and Mubarak Shah. Openldn: Learn- ing to discover novel classes for open-world semi-supervised learning. InECCV, pages 382–401, 2022. 2

  30. [30]

    Towards realistic semi-supervised learning

    Mamshad Nayeem Rizve, Navid Kardan, and Mubarak Shah. Towards realistic semi-supervised learning. InECCV, pages 437–455, 2022. 2

  31. [31]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014. 2

  32. [32]

    Fixmatch: Simplifying semi-supervised learning with consistency and confidence

    Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. InNeurIPS, pages 596–608, 2020. 5

  33. [33]

    Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks

    Yi-Lin Sung, Jaemin Cho, and Mohit Bansal. Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. InCVPR, pages 5227–5237, 2022. 3

  34. [34]

    Going deeper with convolutions

    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. InCVPR, pages 1–9, 2015. 2

  35. [35]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017. 3

  36. [36]

    Generalized category discovery

    Sagar Vaze, Kai Han, Andrea Vedaldi, and Andrew Zisser- man. Generalized category discovery. InCVPR, pages 7492– 7501, 2022. 1, 2, 5, 6

  37. [37]

    The caltech-ucsd birds-200-2011 dataset

    Catherine Wah, Steve Branson, Peter Welinder, Pietro Per- ona, and Serge Belongie. The caltech-ucsd birds-200-2011 dataset. 2011. 5

  38. [38]

    Get: Unlocking the multi-modal potential of clip for generalized category dis- covery

    Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, and Ming-Ming Cheng. Get: Unlocking the multi-modal potential of clip for generalized category dis- covery. InCVPR, pages 20296–20306, 2025. 2

  39. [39]

    Sptnet: An efficient alternative framework for generalized category discovery with spatial prompt tuning.arXiv preprint arXiv:2403.13684, 2024

    Hongjun Wang, Sagar Vaze, and Kai Han. Sptnet: An efficient alternative framework for generalized category discovery with spatial prompt tuning.arXiv preprint arXiv:2403.13684, 2024. 5, 6

  40. [40]

    Discover and align taxonomic context priors for open- world semi-supervised learning

    Yu Wang, Zhun Zhong, Pengchong Qiao, Xuxin Cheng, Xi- awu Zheng, Chang Liu, Nicu Sebe, Rongrong Ji, and Jie Chen. Discover and align taxonomic context priors for open- world semi-supervised learning. InNeurIPS, pages 19015– 19028, 2023. 1, 2, 6

  41. [41]

    Parametric classification for generalized category discovery: A baseline study

    Xin Wen, Bingchen Zhao, and Xiaojuan Qi. Parametric classification for generalized category discovery: A baseline study. InICCV, pages 16590–16600, 2023. 2, 6

  42. [42]

    Targeted representation align- ment for open-world semi-supervised learning

    Ruixuan Xiao, Lei Feng, Kai Tang, Junbo Zhao, Yixuan Li, Gang Chen, and Haobo Wang. Targeted representation align- ment for open-world semi-supervised learning. InCVPR, pages 23072–23082, 2024. 1, 2, 5, 6

  43. [43]

    Mma: Multi-modal adapter for vision-language models

    Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang, and Xiao- hua Xie. Mma: Multi-modal adapter for vision-language models. InCVPR, pages 23826–23837, 2024. 3

  44. [44]

    Candidate pseudolabel learning: Enhancing vision-language models by prompt tuning with unlabeled data

    Jiahan Zhang, Qi Wei, Feng Liu, and Lei Feng. Candidate pseudolabel learning: Enhancing vision-language models by prompt tuning with unlabeled data. InICML, pages 60004– 60020, 2024. 4, 5

  45. [45]

    Prompt- cal: Contrastive affinity learning via auxiliary prompts for generalized novel category discovery

    Sheng Zhang, Salman Khan, Zhiqiang Shen, Muzammal Naseer, Guangyi Chen, and Fahad Shahbaz Khan. Prompt- cal: Contrastive affinity learning via auxiliary prompts for generalized novel category discovery. InCVPR, pages 3479– 3488, 2023. 2, 5, 6

  46. [46]

    Learning semi- supervised gaussian mixture models for generalized category discovery

    Bingchen Zhao, Xin Wen, and Kai Han. Learning semi- supervised gaussian mixture models for generalized category discovery. InICCV, pages 16623–16633, 2023. 6

  47. [47]

    Textual knowledge matters: Cross-modality co- teaching for generalized visual class discovery

    Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, and Zhun Zhong. Textual knowledge matters: Cross-modality co- teaching for generalized visual class discovery. InECCV, pages 41–58, 2024. 1, 2, 5, 6