pith. machine review for the scientific record. sign in

arxiv: 2604.11484 · v1 · submitted 2026-04-13 · 💻 cs.CV

Recognition: unknown

PACO: Proxy-Task Alignment and Online Calibration for On-the-Fly Category Discovery

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:44 UTC · model grok-4.3

classification 💻 cs.CV
keywords on-the-fly category discoveryonline calibrationprototype memorynovel class discoverytree-structured decisionsopen-set recognitioncomputer visionstreaming inference
0
0 comments X

The pith

A tree-structured decision process with proxy-initialized and online-updated thresholds improves stability in on-the-fly category discovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing methods for on-the-fly category discovery train representations offline but then apply a single fixed threshold at inference to decide whether a sample is known, matches an existing novel class, or starts a new one. The paper claims this static approach produces inconsistent category formation because real inference is a dynamic sequence of choices that should adapt as evidence arrives. PACO instead routes each sample through a hierarchy of decisions over a growing prototype memory, initializes its thresholds by simulating the discovery process during training, and then refines those thresholds from mature novel prototypes while inference runs. The method adds no extra training and requires no per-dataset tuning, so it can plug into existing pipelines. If the claim holds, models would form more reliable new categories from streaming data while still recognizing known classes across varied benchmarks.

Core claim

OCD is a dynamic process requiring continuous decisions on known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory. By calibrating thresholds offline through proxy discovery simulation to align with inference needs and then updating them online from mature novel prototypes, the resulting tree-structured framework produces stable category formation without heavy retraining or dataset-specific tuning.

What carries the argument

The support-set-calibrated tree-structured online decision framework that sequences known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory, with thresholds initialized by proxy simulation and updated from mature novel prototypes.

If this is right

  • Existing OCD pipelines gain an inference-time module that improves known and novel class handling without retraining the underlying representation.
  • Thresholds adapt continuously during inference, reducing the inconsistency that arises from static boundaries.
  • No dataset-specific tuning is required, so the same framework can be deployed across different streaming benchmarks.
  • Dynamic prototype memory supports attach-versus-create decisions that keep category formation coherent as new samples arrive.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same proxy-simulation plus online-update pattern could be tested in other streaming recognition settings where decision boundaries must evolve without full retraining.
  • If the attach-versus-create logic generalizes, it might reduce the size of the initial support set needed for reliable open-world performance.
  • Longer real-world video streams with many novel classes arriving at irregular intervals would provide a direct test of whether mature-prototype updates prevent drift.

Load-bearing premise

Thresholds calibrated offline by simulating the proxy discovery process will align with the changing needs of real-time inference and produce stable categories when they are updated from mature novel prototypes without any dataset-specific adjustments.

What would settle it

Apply the framework to a long streaming sequence containing gradually introduced novel classes and measure whether the number and purity of formed categories remain consistent over time; if performance falls to the level of fixed-threshold baselines or if clusters fragment, the claim would be refuted.

Figures

Figures reproduced from arXiv: 2604.11484 by Bohan Zhang, Weidong Tang, Yanan Wu, Yang Wang, Zhixiang Chi, ZiZhang Wu.

Figure 1
Figure 1. Figure 1: Overview of the proposed framework. The model first learns a spherical representation from the support set. At test [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Hyperparameter sensitivity analysis of 𝑚, 𝜆stat, 𝛼, and 𝛽. PACO exhibits robust performance across a wide range of settings, with default configurations (dashed lines) situ￾ated in stable regions for all metrics. While SCars New accu￾racy is sensitive to 𝑚 > 0.6, the model remains remarkably stable across variations in 𝜆stat, 𝛼, and 𝛽. These results validate the reliability and generalizability of our defa… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of support-set calibration on Stanford [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Strict–Hungarian All/Old/New accuracy under dif [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Per-sample latency of the inference-time deci [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

On-the-Fly Category Discovery (OCD) requires a model, trained on an offline support set, to recognize known classes while discovering new ones from an online streaming sequence. Existing methods focus heavily on offline training. They aim to learn discriminative representations on the support set so that novel classes can be separated at test time. However, their discovery mechanism at inference is typically reduced to a single threshold. We argue that this paradigm is fundamentally flawed as OCD is not a static classification problem, but a dynamic process. The model must continuously decide 1) whether a sample belongs to a known class, 2) matches an existing novel category, or 3) should initiate a new one. Moreover, prior methods treat the support set as fixed knowledge. They do not update their decision boundaries as new evidence arrives during inference. This leads to unstable and inconsistent category formation. Our experiments confirm these issues. With properly calibrated and adaptive thresholds, substantial improvements can be achieved, even without changing the representation. Motivated by this, we propose PACO, a support-set-calibrated, tree-structured online decision framework. The framework models inference as a sequence of hierarchical decisions, including known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory. Furthermore, we simulate the proxy discovery process to initialize the thresholds during offline training to align with inference. Thresholds are continuously updated during inference using mature novel prototypes. Importantly, PACO requires no heavy training and no dataset-specific tuning. It can be directly integrated into existing OCD pipelines as an inference-time module. Extensive experiments show significant improvements over SOTA baselines across seven benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper claims that single-threshold inference in on-the-fly category discovery (OCD) is fundamentally flawed for the dynamic, multi-way decisions required (known-class routing, matching existing novel categories, or creating new ones) and that prior methods fail to update boundaries as evidence arrives during streaming inference. It proposes PACO, a support-set-calibrated tree-structured online decision framework using dynamic prototype memory, with thresholds initialized offline by simulating the proxy discovery process and continuously updated online from mature novel prototypes. The method is presented as a lightweight inference-time module integrable into existing OCD pipelines without heavy retraining or dataset-specific tuning, and experiments across seven benchmarks are said to show significant improvements over SOTA baselines.

Significance. If the reported gains prove robust, this could meaningfully advance OCD research by redirecting attention from representation learning alone to inference-time hierarchical calibration and adaptive thresholds. The practical framing as a plug-in module that improves stability without retraining is a clear strength, and the proxy-simulation idea for threshold alignment offers a plausible way to bridge offline training and online streaming if the alignment holds empirically.

major comments (2)
  1. §3 (method description): The central claim that offline simulation of proxy discovery produces thresholds aligned with online inference needs explicit validation; without an ablation comparing simulated initialization against fixed or random thresholds (and reporting the resulting impact on category stability and accuracy), the assertion of no dataset-specific tuning remains untested and load-bearing for the no-tuning guarantee.
  2. §4 (experiments): The abstract and results claim substantial improvements even without representation changes, yet no quantitative deltas, baseline details, error bars, or statistical significance tests are referenced for the seven benchmarks; this undermines assessment of whether the hierarchical decisions and online updates are the true source of gains versus variance or implementation details.
minor comments (3)
  1. The abstract would be strengthened by including at least one concrete performance number or benchmark name to ground the 'significant improvements' statement.
  2. Introduce formal notation or pseudocode for the attach-versus-create decision rule and the maturity criterion for prototype updates earlier in the method section to improve reproducibility.
  3. Ensure consistent use of terms such as 'mature novel prototypes' with a precise definition (e.g., sample count or confidence threshold) to avoid ambiguity in the online update procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and agree that targeted additions will strengthen the manuscript. We will revise accordingly.

read point-by-point responses
  1. Referee: §3 (method description): The central claim that offline simulation of proxy discovery produces thresholds aligned with online inference needs explicit validation; without an ablation comparing simulated initialization against fixed or random thresholds (and reporting the resulting impact on category stability and accuracy), the assertion of no dataset-specific tuning remains untested and load-bearing for the no-tuning guarantee.

    Authors: We agree that an explicit ablation would provide stronger empirical support for the alignment between the offline proxy simulation and online inference. In the revised manuscript, we will add a dedicated ablation study comparing the simulated threshold initialization against fixed and random alternatives. This study will quantify effects on category stability (measured by consistency of novel category assignments across streaming sequences) and discovery accuracy across the benchmarks, directly testing the no dataset-specific tuning property. revision: yes

  2. Referee: §4 (experiments): The abstract and results claim substantial improvements even without representation changes, yet no quantitative deltas, baseline details, error bars, or statistical significance tests are referenced for the seven benchmarks; this undermines assessment of whether the hierarchical decisions and online updates are the true source of gains versus variance or implementation details.

    Authors: We concur that more granular quantitative reporting is necessary to substantiate the claims. In the revision, we will expand the experimental section with full tables reporting per-benchmark deltas versus each baseline, complete implementation details for all compared methods, error bars computed over multiple random seeds, and statistical significance tests (e.g., paired t-tests with p-values) to isolate the contribution of the hierarchical decision tree and online prototype updates from other sources of variation. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents PACO as an inference-time module that performs hierarchical decisions over a dynamic prototype memory, with thresholds initialized by simulating the proxy discovery process offline and then updated online from mature prototypes. No equations, derivations, or self-citations are shown that reduce the claimed performance gains or category-formation stability to quantities defined by the inputs themselves. The central argument rests on the described mechanisms (known-class routing, birth-aware assignment, attach-vs-create) rather than any self-definitional loop, fitted-input-as-prediction, or load-bearing self-citation. This matches the provided reader's assessment that no self-referential reduction exists.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

Ledger based solely on abstract; full paper may contain additional parameters or entities. Thresholds are calibrated via simulation rather than freely fitted ad hoc.

free parameters (1)
  • thresholds
    Initialized via proxy discovery simulation in offline training and updated online using mature novel prototypes
axioms (2)
  • domain assumption OCD inference requires continuous hierarchical decisions among known class, match to existing novel category, or creation of new category
    Core motivation stated for replacing single-threshold methods with tree-structured framework
  • domain assumption Support set provides sufficient calibration signal for inference-time decisions without heavy retraining
    Basis for the support-set-calibrated and inference-time module design
invented entities (2)
  • dynamic prototype memory no independent evidence
    purpose: Stores and updates representations of novel categories to support attach-versus-create decisions during streaming
    New component introduced as part of the tree-structured online framework
  • tree-structured online decision framework no independent evidence
    purpose: Organizes the sequence of known-class routing, birth-aware assignment, and attach-versus-create operations
    Core proposed architecture for PACO

pith-pipeline@v0.9.0 · 5613 in / 1680 out tokens · 50633 ms · 2026-05-10T16:44:00.625535+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Anwesha Banerjee and Soma Biswas. 2025. Language-assisted Feature Repre- sentation and Lightweight Active Learning For On-the-Fly Category Discovery. Transactions on Machine Learning Research(2025)

  2. [2]

    Arindam Banerjee, Inderjit S Dhillon, Joydeep Ghosh, Suvrit Sra, and Greg Ridgeway. 2005. Clustering on the Unit Hypersphere using von Mises-Fisher Distributions.Journal of Machine Learning Research6, 9 (2005)

  3. [3]

    Abhijit Bendale and Terrance E Boult. 2016. Towards open set deep networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 1563–1572

  4. [4]

    Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101–mining discriminative components with random forests. InEuropean conference on com- puter vision. Springer, 446–461

  5. [5]

    Feng Cao, Martin Estert, Weining Qian, and Aoying Zhou. 2006. Density-based clustering over an evolving data stream with noise. InProceedings of the 2006 SIAM international conference on data mining. SIAM, 328–339

  6. [6]

    Kaidi Cao, Maria Brbic, and Jure Leskovec. 2021. Open-world semi-supervised learning.arXiv preprint arXiv:2102.03526(2021)

  7. [7]

    Xinzi Cao, Ke Chen, Feidiao Yang, Xiawu Zheng, Yonghong Tian, and Yutong Lu. 2025. AllGCD: Leveraging All Unlabeled Data for Generalized Category Discovery. InProceedings of the IEEE/CVF International Conference on Computer Vision. 3293–3303

  8. [8]

    Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision. 9650–9660

  9. [9]

    Sua Choi, Dahyun Kang, and Minsu Cho. 2024. Contrastive mean-shift learning for generalized category discovery. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 23094–23104

  10. [10]

    Steve Cruz, Ryan Rabinowitz, Manuel Günther, and Terrance E Boult. 2024. Op- erational open-set recognition and postmax refinement. InEuropean Conference on Computer Vision. Springer, 475–492

  11. [11]

    Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4690–4699

  12. [12]

    Alexey Dosovitskiy. 2020. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929(2020)

  13. [13]

    Ruoyi Du, Dongliang Chang, Kongming Liang, Timothy Hospedales, Yi-Zhe Song, and Zhanyu Ma. 2023. On-the-fly category discovery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11691–11700

  14. [14]

    Michael Hahsler and Matthew Bolaños. 2016. Clustering data streams based on shared density between micro-clusters.IEEE transactions on knowledge and data engineering28, 6 (2016), 1449–1461

  15. [15]

    Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, and An- drew Zisserman. 2021. Autonovel: Automatically discovering and learning novel visual categories.IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2021), 6767–6781

  16. [16]

    Kai Han, Andrea Vedaldi, and Andrew Zisserman. 2019. Learning to discover novel visual categories via deep transfer clustering. InProceedings of the IEEE/CVF international conference on computer vision. 8401–8409

  17. [17]

    1975.Clustering algorithms

    John A Hartigan. 1975.Clustering algorithms. John Wiley & Sons, Inc

  18. [18]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778

  19. [19]

    Francisco Herrera, Francisco Charte, Antonio J Rivera, and María J Del Jesus

  20. [20]

    InMultilabel Classification: Problem Analysis, Metrics and Techniques

    Multilabel classification. InMultilabel Classification: Problem Analysis, Metrics and Techniques. Springer, 17–31

  21. [21]

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger

  22. [22]

    InProceedings of the IEEE conference on computer vision and pattern recognition

    Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708

  23. [23]

    Shiyuan Huang, Jiawei Ma, Guangxing Han, and Shih-Fu Chang. 2022. Task- adaptive negative envision for few-shot open-set recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7171–7180

  24. [24]

    Xuhui Jia, Kai Han, Yukun Zhu, and Bradley Green. 2021. Joint representa- tion learning and novel category discovery on single-and multi-modal data. In Proceedings of the IEEE/CVF international conference on computer vision. 610–619

  25. [25]

    Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3d object repre- sentations for fine-grained categorization. InProceedings of the IEEE international conference on computer vision workshops. 554–561

  26. [26]

    Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009)

  27. [27]

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning.nature 521, 7553 (2015), 436–444

  28. [28]

    Chunming Li, Shidong Wang, and Haofeng Zhang. 2025. Adaptive Gaussian Expansion for On-the-fly Category Discovery. InThe Fourteenth International Conference on Learning Representations

  29. [29]

    Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy Hospedales. 2018. Learning to generalize: Meta-learning for domain generalization. InProceedings of the AAAI conference on artificial intelligence, Vol. 32

  30. [30]

    Yuelin Li, Elizabeth Schofield, and Mithat Gönen. 2019. A tutorial on Dirichlet process mixture modeling.Journal of mathematical psychology91 (2019), 128– 144

  31. [31]

    Xiao Liu, Nan Pu, Haiyang Zheng, Wenjing Li, Nicu Sebe, and Zhun Zhong. 2025. Generate, refine, and encode: Leveraging synthesized novel samples for on-the- fly fine-grained category discovery. InProceedings of the IEEE/CVF International Conference on Computer Vision. 1078–1087

  32. [32]

    Yu Liu, Yaqi Cai, Qi Jia, Binglin Qiu, Weimin Wang, and Nan Pu. 2024. Novel class discovery for ultra-fine-grained visual categorization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17679–17688

  33. [33]

    Yuanpei Liu, Zhenqi He, and Kai Han. 2025. Hyperbolic category discovery. In Proceedings of the Computer Vision and Pattern Recognition Conference. 9891– 9900

  34. [34]

    Yingbing Liu, Fei Ma, Yanan Wu, Xinxin Zuo, Fan Zhang, and Yang Wang. 2025. Collaborative Cloud-edge Generalized Category Discovery. InProceedings of the 33rd ACM International Conference on Multimedia. 535–543

  35. [35]

    Shijie Ma, Fei Zhu, Xu-Yao Zhang, and Cheng-Lin Liu. 2025. Protogcd: Uni- fied and unbiased prototype learning for generalized category discovery.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)

  36. [36]

    Shijie Ma, Fei Zhu, Zhun Zhong, Wenzhuo Liu, Xu-Yao Zhang, and Cheng-Lin Liu. 2024. Happy: A debiased learning framework for continual generalized category discovery.Advances in Neural Information Processing Systems37 (2024), 50850–50875

  37. [37]

    Shijie Ma, Fei Zhu, Zhun Zhong, Xu-Yao Zhang, and Cheng-Lin Liu. 2024. Active generalized category discovery. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16890–16900

  38. [38]

    Dimity Miller, Niko Sunderhauf, Michael Milford, and Feras Dayoub. 2021. Class anchor clustering: A loss for distance-based open set recognition. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3570–3578

  39. [39]

    Rabah Ouldnoughi, Chia-Wen Kuo, and Zsolt Kira. 2023. Clip-gcd: Simple lan- guage guided generalized category discovery.arXiv preprint arXiv:2305.10420 (2023)

  40. [40]

    Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. 2012. Cats and dogs. In2012 IEEE conference on computer vision and pattern recognition. IEEE, 3498–3505

  41. [41]

    Zhengyuan Peng, Jinpeng Ma, Zhimin Sun, Ran Yi, Haichuan Song, Xin Tan, and Lizhuang Ma. 2025. Mos: Modeling object-scene associations in generalized category discovery. InProceedings of the Computer Vision and Pattern Recognition Conference. 15118–15128

  42. [42]

    Nan Pu, Wenjing Li, Xingyuan Ji, Yalan Qin, Nicu Sebe, and Zhun Zhong. 2024. Federated generalized category discovery. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. 28741–28750

  43. [43]

    Sarah Rastegar, Hazel Doughty, and Cees Snoek. 2023. Learn to categorize or categorize to learn? self-coding for generalized category discovery.Advances in Neural Information Processing Systems36 (2023), 72794–72818

  44. [44]

    Sarah Rastegar, Mohammadreza Salehi, Yuki M Asano, Hazel Doughty, and Cees GM Snoek. 2024. Selex: Self-expertise in fine-grained generalized category discovery. InEuropean Conference on Computer Vision. Springer, 440–458

  45. [45]

    Vaibhav Rathore, Saikat Dutta, Sarthak Mehrotra, Zsolt Kira, Biplab Banerjee, et al. 2025. When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach. InProceedings of the Computer Vision and Pattern Recognition Conference. 4905–4915

  46. [46]

    Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, and Mubarak Shah. 2022. Openldn: Learning to discover novel classes for open-world semi-supervised learning. InEuropean Conference on Computer Vision. Springer, 382–401

  47. [47]

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al

  48. [48]

    Imagenet large scale visual recognition challenge.International journal of computer vision115, 3 (2015), 211–252

  49. [49]

    Wenkai Shi, Wenbin An, Feng Tian, Yan Chen, Yaqiang Wu, Qianying Wang, and Ping Chen. 2024. A unified knowledge transfer network for generalized category discovery. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 18961–18969

  50. [50]

    Suvrit Sra. 2012. A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of I s (x).Computational Statistics27, 1 (2012), 177–190

  51. [51]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

  52. [52]

    Sagar Vaze, Kai Han, Andrea Vedaldi, and Andrew Zisserman. 2022. Generalized category discovery. InProceedings of the IEEE/CVF conference on computer vision 9 Tang et al. and pattern recognition. 7492–7501

  53. [53]

    Sagar Vaze, Andrea Vedaldi, and Andrew Zisserman. 2023. No representation rules them all in category discovery.Advances in Neural Information Processing Systems36 (2023), 19962–19989

  54. [54]

    Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie

  55. [55]

    The caltech-ucsd birds-200-2011 dataset. (2011)

  56. [56]

    Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, and Ming- Ming Cheng. 2025. Get: Unlocking the multi-modal potential of clip for gen- eralized category discovery. InProceedings of the Computer Vision and Pattern Recognition Conference. 20296–20306

  57. [57]

    Hongjun Wang, Sagar Vaze, and Kai Han. 2024. Hilo: A learning framework for generalized category discovery robust to domain shifts.arXiv preprint arXiv:2408.04591(2024)

  58. [58]

    Hongjun Wang, Sagar Vaze, and Kai Han. 2024. Sptnet: An efficient alternative framework for generalized category discovery with spatial prompt tuning.arXiv preprint arXiv:2403.13684(2024)

  59. [59]

    Xin Wen, Bingchen Zhao, and Xiaojuan Qi. 2023. Parametric classification for generalized category discovery: A baseline study. InProceedings of the IEEE/CVF international conference on computer vision. 16590–16600

  60. [60]

    Yanan Wu, Zhixiang Chi, Yang Wang, and Songhe Feng. 2023. Metagcd: Learning to continually learn in generalized category discovery. InProceedings of the IEEE/CVF international conference on computer vision. 1655–1665

  61. [61]

    Zelin Zang, Lei Shang, Senqiao Yang, Fei Wang, Baigui Sun, Xuansong Xie, and Stan Z Li. 2023. Boosting novel category discovery over domains with soft contrastive learning and all in one classifier. InProceedings of the IEEE/CVF International Conference on Computer Vision. 11858–11867

  62. [62]

    Chuyu Zhang, Ruijie Xu, and Xuming He. 2023. Novel class discovery for long- tailed recognition.arXiv preprint arXiv:2308.02989(2023)

  63. [63]

    Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, and Yifan Xing. 2024. Learning for Transductive Threshold Calibration in Open-World Recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17097–17106

  64. [64]

    Wei Zhang, Baopeng Zhang, Zhu Teng, Wenxin Luo, Junnan Zou, and Jianping Fan. 2025. Less attention is more: Prompt transformer for generalized cate- gory discovery. InProceedings of the Computer Vision and Pattern Recognition Conference. 30322–30331

  65. [65]

    Bingchen Zhao and Kai Han. 2021. Novel visual category discovery with dual ranking statistics and mutual knowledge distillation.Advances in Neural Infor- mation Processing Systems34 (2021), 22982–22994

  66. [66]

    Bingchen Zhao, Nico Lang, Serge Belongie, and Oisin Mac Aodha. 2024. Labeled data selection for category discovery. InEuropean Conference on Computer Vision. Springer, 201–218

  67. [67]

    Bingchen Zhao, Xin Wen, and Kai Han. 2023. Learning semi-supervised gaussian mixture models for generalized category discovery. InProceedings of the IEEE/CVF international conference on computer vision. 16623–16633

  68. [68]

    Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, and Zhun Zhong. 2024. Proto- typical hash encoding for on-the-fly fine-grained category discovery.Advances in Neural Information Processing Systems37 (2024), 101428–101455

  69. [69]

    Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, and Zhun Zhong. 2024. Textual knowledge matters: Cross-modality co-teaching for generalized visual class discovery. InEuropean Conference on Computer Vision. Springer, 41–58

  70. [70]

    Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. 2021. Learning placeholders for open-set recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410. 10 PACO: Proxy-Task Alignment and Online Calibration for On-the-Fly Category Discovery Appendix Overview This appendix is organized as follows. Sec. A summarizes t...