arxiv: 2604.11484 · v1 · submitted 2026-04-13 · 💻 cs.CV

Recognition: unknown

PACO: Proxy-Task Alignment and Online Calibration for On-the-Fly Category Discovery

Weidong Tang , Bohan Zhang , Zhixiang Chi , ZiZhang Wu , Yang Wang , Yanan Wu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:44 UTC · model grok-4.3

classification 💻 cs.CV

keywords on-the-fly category discoveryonline calibrationprototype memorynovel class discoverytree-structured decisionsopen-set recognitioncomputer visionstreaming inference

0 comments

The pith

A tree-structured decision process with proxy-initialized and online-updated thresholds improves stability in on-the-fly category discovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing methods for on-the-fly category discovery train representations offline but then apply a single fixed threshold at inference to decide whether a sample is known, matches an existing novel class, or starts a new one. The paper claims this static approach produces inconsistent category formation because real inference is a dynamic sequence of choices that should adapt as evidence arrives. PACO instead routes each sample through a hierarchy of decisions over a growing prototype memory, initializes its thresholds by simulating the discovery process during training, and then refines those thresholds from mature novel prototypes while inference runs. The method adds no extra training and requires no per-dataset tuning, so it can plug into existing pipelines. If the claim holds, models would form more reliable new categories from streaming data while still recognizing known classes across varied benchmarks.

Core claim

OCD is a dynamic process requiring continuous decisions on known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory. By calibrating thresholds offline through proxy discovery simulation to align with inference needs and then updating them online from mature novel prototypes, the resulting tree-structured framework produces stable category formation without heavy retraining or dataset-specific tuning.

What carries the argument

The support-set-calibrated tree-structured online decision framework that sequences known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory, with thresholds initialized by proxy simulation and updated from mature novel prototypes.

If this is right

Existing OCD pipelines gain an inference-time module that improves known and novel class handling without retraining the underlying representation.
Thresholds adapt continuously during inference, reducing the inconsistency that arises from static boundaries.
No dataset-specific tuning is required, so the same framework can be deployed across different streaming benchmarks.
Dynamic prototype memory supports attach-versus-create decisions that keep category formation coherent as new samples arrive.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same proxy-simulation plus online-update pattern could be tested in other streaming recognition settings where decision boundaries must evolve without full retraining.
If the attach-versus-create logic generalizes, it might reduce the size of the initial support set needed for reliable open-world performance.
Longer real-world video streams with many novel classes arriving at irregular intervals would provide a direct test of whether mature-prototype updates prevent drift.

Load-bearing premise

Thresholds calibrated offline by simulating the proxy discovery process will align with the changing needs of real-time inference and produce stable categories when they are updated from mature novel prototypes without any dataset-specific adjustments.

What would settle it

Apply the framework to a long streaming sequence containing gradually introduced novel classes and measure whether the number and purity of formed categories remain consistent over time; if performance falls to the level of fixed-threshold baselines or if clusters fragment, the claim would be refuted.

Figures

Figures reproduced from arXiv: 2604.11484 by Bohan Zhang, Weidong Tang, Yanan Wu, Yang Wang, Zhixiang Chi, ZiZhang Wu.

**Figure 1.** Figure 1: Overview of the proposed framework. The model first learns a spherical representation from the support set. At test [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Hyperparameter sensitivity analysis of 𝑚, 𝜆stat, 𝛼, and 𝛽. PACO exhibits robust performance across a wide range of settings, with default configurations (dashed lines) situated in stable regions for all metrics. While SCars New accuracy is sensitive to 𝑚 > 0.6, the model remains remarkably stable across variations in 𝜆stat, 𝛼, and 𝛽. These results validate the reliability and generalizability of our defa… view at source ↗

**Figure 3.** Figure 3: Visualization of support-set calibration on Stanford [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Strict–Hungarian All/Old/New accuracy under dif [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 6.** Figure 6: Per-sample latency of the inference-time deci [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

read the original abstract

On-the-Fly Category Discovery (OCD) requires a model, trained on an offline support set, to recognize known classes while discovering new ones from an online streaming sequence. Existing methods focus heavily on offline training. They aim to learn discriminative representations on the support set so that novel classes can be separated at test time. However, their discovery mechanism at inference is typically reduced to a single threshold. We argue that this paradigm is fundamentally flawed as OCD is not a static classification problem, but a dynamic process. The model must continuously decide 1) whether a sample belongs to a known class, 2) matches an existing novel category, or 3) should initiate a new one. Moreover, prior methods treat the support set as fixed knowledge. They do not update their decision boundaries as new evidence arrives during inference. This leads to unstable and inconsistent category formation. Our experiments confirm these issues. With properly calibrated and adaptive thresholds, substantial improvements can be achieved, even without changing the representation. Motivated by this, we propose PACO, a support-set-calibrated, tree-structured online decision framework. The framework models inference as a sequence of hierarchical decisions, including known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory. Furthermore, we simulate the proxy discovery process to initialize the thresholds during offline training to align with inference. Thresholds are continuously updated during inference using mature novel prototypes. Importantly, PACO requires no heavy training and no dataset-specific tuning. It can be directly integrated into existing OCD pipelines as an inference-time module. Extensive experiments show significant improvements over SOTA baselines across seven benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PACO replaces single-threshold OCD inference with a hierarchical decision tree and online prototype calibration as an inference-time module, but the reported gains need concrete numbers to judge their size.

read the letter

PACO's main point is that on-the-fly category discovery is a sequence of decisions rather than one threshold check, so the authors build a tree that routes known classes, handles birth-aware novel assignment, and decides attach-versus-create over a growing prototype memory, with thresholds initialized by proxy simulation and updated online from mature prototypes. This is offered as a lightweight add-on that slots into existing OCD pipelines without retraining or per-dataset tuning. The shift from static single-threshold methods is the clearest new element. It directly targets the instability that comes from treating the support set as fixed knowledge during streaming inference. The paper does a solid job laying out why prior work leaves category formation inconsistent and why even representation improvements can be undercut by poor decision rules at test time. The inference-time focus is practical and avoids the heavy training emphasis common in the area. The soft spots sit in the experimental claims. The abstract states that calibrated adaptive thresholds deliver substantial improvements even without representation changes and that results hold across seven benchmarks, yet the strength of those claims depends on seeing the actual deltas, baselines, ablations, and whether the online updates remain stable under different streaming orders. The assumption that proxy simulation during training produces thresholds that stay aligned with real inference dynamics is reasonable on paper but could prove sensitive in practice. This paper is for researchers working on open-world or streaming vision systems who already have a base OCD model and want to improve the discovery step without starting over. Readers who care about deployable fixes rather than new training objectives will find the most direct value. It deserves a serious referee because the problem it identifies is genuine and the proposed mechanism is concrete enough to test in detail. I would send it to peer review.

Referee Report

2 major / 3 minor

Summary. The paper claims that single-threshold inference in on-the-fly category discovery (OCD) is fundamentally flawed for the dynamic, multi-way decisions required (known-class routing, matching existing novel categories, or creating new ones) and that prior methods fail to update boundaries as evidence arrives during streaming inference. It proposes PACO, a support-set-calibrated tree-structured online decision framework using dynamic prototype memory, with thresholds initialized offline by simulating the proxy discovery process and continuously updated online from mature novel prototypes. The method is presented as a lightweight inference-time module integrable into existing OCD pipelines without heavy retraining or dataset-specific tuning, and experiments across seven benchmarks are said to show significant improvements over SOTA baselines.

Significance. If the reported gains prove robust, this could meaningfully advance OCD research by redirecting attention from representation learning alone to inference-time hierarchical calibration and adaptive thresholds. The practical framing as a plug-in module that improves stability without retraining is a clear strength, and the proxy-simulation idea for threshold alignment offers a plausible way to bridge offline training and online streaming if the alignment holds empirically.

major comments (2)

§3 (method description): The central claim that offline simulation of proxy discovery produces thresholds aligned with online inference needs explicit validation; without an ablation comparing simulated initialization against fixed or random thresholds (and reporting the resulting impact on category stability and accuracy), the assertion of no dataset-specific tuning remains untested and load-bearing for the no-tuning guarantee.
§4 (experiments): The abstract and results claim substantial improvements even without representation changes, yet no quantitative deltas, baseline details, error bars, or statistical significance tests are referenced for the seven benchmarks; this undermines assessment of whether the hierarchical decisions and online updates are the true source of gains versus variance or implementation details.

minor comments (3)

The abstract would be strengthened by including at least one concrete performance number or benchmark name to ground the 'significant improvements' statement.
Introduce formal notation or pseudocode for the attach-versus-create decision rule and the maturity criterion for prototype updates earlier in the method section to improve reproducibility.
Ensure consistent use of terms such as 'mature novel prototypes' with a precise definition (e.g., sample count or confidence threshold) to avoid ambiguity in the online update procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and agree that targeted additions will strengthen the manuscript. We will revise accordingly.

read point-by-point responses

Referee: §3 (method description): The central claim that offline simulation of proxy discovery produces thresholds aligned with online inference needs explicit validation; without an ablation comparing simulated initialization against fixed or random thresholds (and reporting the resulting impact on category stability and accuracy), the assertion of no dataset-specific tuning remains untested and load-bearing for the no-tuning guarantee.

Authors: We agree that an explicit ablation would provide stronger empirical support for the alignment between the offline proxy simulation and online inference. In the revised manuscript, we will add a dedicated ablation study comparing the simulated threshold initialization against fixed and random alternatives. This study will quantify effects on category stability (measured by consistency of novel category assignments across streaming sequences) and discovery accuracy across the benchmarks, directly testing the no dataset-specific tuning property. revision: yes
Referee: §4 (experiments): The abstract and results claim substantial improvements even without representation changes, yet no quantitative deltas, baseline details, error bars, or statistical significance tests are referenced for the seven benchmarks; this undermines assessment of whether the hierarchical decisions and online updates are the true source of gains versus variance or implementation details.

Authors: We concur that more granular quantitative reporting is necessary to substantiate the claims. In the revision, we will expand the experimental section with full tables reporting per-benchmark deltas versus each baseline, complete implementation details for all compared methods, error bars computed over multiple random seeds, and statistical significance tests (e.g., paired t-tests with p-values) to isolate the contribution of the hierarchical decision tree and online prototype updates from other sources of variation. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents PACO as an inference-time module that performs hierarchical decisions over a dynamic prototype memory, with thresholds initialized by simulating the proxy discovery process offline and then updated online from mature prototypes. No equations, derivations, or self-citations are shown that reduce the claimed performance gains or category-formation stability to quantities defined by the inputs themselves. The central argument rests on the described mechanisms (known-class routing, birth-aware assignment, attach-vs-create) rather than any self-definitional loop, fitted-input-as-prediction, or load-bearing self-citation. This matches the provided reader's assessment that no self-referential reduction exists.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

Ledger based solely on abstract; full paper may contain additional parameters or entities. Thresholds are calibrated via simulation rather than freely fitted ad hoc.

free parameters (1)

thresholds
Initialized via proxy discovery simulation in offline training and updated online using mature novel prototypes

axioms (2)

domain assumption OCD inference requires continuous hierarchical decisions among known class, match to existing novel category, or creation of new category
Core motivation stated for replacing single-threshold methods with tree-structured framework
domain assumption Support set provides sufficient calibration signal for inference-time decisions without heavy retraining
Basis for the support-set-calibrated and inference-time module design

invented entities (2)

dynamic prototype memory no independent evidence
purpose: Stores and updates representations of novel categories to support attach-versus-create decisions during streaming
New component introduced as part of the tree-structured online framework
tree-structured online decision framework no independent evidence
purpose: Organizes the sequence of known-class routing, birth-aware assignment, and attach-versus-create operations
Core proposed architecture for PACO

pith-pipeline@v0.9.0 · 5613 in / 1680 out tokens · 50633 ms · 2026-05-10T16:44:00.625535+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Anwesha Banerjee and Soma Biswas. 2025. Language-assisted Feature Repre- sentation and Lightweight Active Learning For On-the-Fly Category Discovery. Transactions on Machine Learning Research(2025)

2025
[2]

Arindam Banerjee, Inderjit S Dhillon, Joydeep Ghosh, Suvrit Sra, and Greg Ridgeway. 2005. Clustering on the Unit Hypersphere using von Mises-Fisher Distributions.Journal of Machine Learning Research6, 9 (2005)

2005
[3]

Abhijit Bendale and Terrance E Boult. 2016. Towards open set deep networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 1563–1572

2016
[4]

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101–mining discriminative components with random forests. InEuropean conference on com- puter vision. Springer, 446–461

2014
[5]

Feng Cao, Martin Estert, Weining Qian, and Aoying Zhou. 2006. Density-based clustering over an evolving data stream with noise. InProceedings of the 2006 SIAM international conference on data mining. SIAM, 328–339

2006
[6]

Kaidi Cao, Maria Brbic, and Jure Leskovec. 2021. Open-world semi-supervised learning.arXiv preprint arXiv:2102.03526(2021)

work page arXiv 2021
[7]

Xinzi Cao, Ke Chen, Feidiao Yang, Xiawu Zheng, Yonghong Tian, and Yutong Lu. 2025. AllGCD: Leveraging All Unlabeled Data for Generalized Category Discovery. InProceedings of the IEEE/CVF International Conference on Computer Vision. 3293–3303

2025
[8]

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision. 9650–9660

2021
[9]

Sua Choi, Dahyun Kang, and Minsu Cho. 2024. Contrastive mean-shift learning for generalized category discovery. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 23094–23104

2024
[10]

Steve Cruz, Ryan Rabinowitz, Manuel Günther, and Terrance E Boult. 2024. Op- erational open-set recognition and postmax refinement. InEuropean Conference on Computer Vision. Springer, 475–492

2024
[11]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4690–4699

2019
[12]

Alexey Dosovitskiy. 2020. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[13]

Ruoyi Du, Dongliang Chang, Kongming Liang, Timothy Hospedales, Yi-Zhe Song, and Zhanyu Ma. 2023. On-the-fly category discovery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11691–11700

2023
[14]

Michael Hahsler and Matthew Bolaños. 2016. Clustering data streams based on shared density between micro-clusters.IEEE transactions on knowledge and data engineering28, 6 (2016), 1449–1461

2016
[15]

Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, and An- drew Zisserman. 2021. Autonovel: Automatically discovering and learning novel visual categories.IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2021), 6767–6781

2021
[16]

Kai Han, Andrea Vedaldi, and Andrew Zisserman. 2019. Learning to discover novel visual categories via deep transfer clustering. InProceedings of the IEEE/CVF international conference on computer vision. 8401–8409

2019
[17]

1975.Clustering algorithms

John A Hartigan. 1975.Clustering algorithms. John Wiley & Sons, Inc

1975
[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778

2016
[19]

Francisco Herrera, Francisco Charte, Antonio J Rivera, and María J Del Jesus
[20]

InMultilabel Classification: Problem Analysis, Metrics and Techniques

Multilabel classification. InMultilabel Classification: Problem Analysis, Metrics and Techniques. Springer, 17–31
[21]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger
[22]

InProceedings of the IEEE conference on computer vision and pattern recognition

Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708
[23]

Shiyuan Huang, Jiawei Ma, Guangxing Han, and Shih-Fu Chang. 2022. Task- adaptive negative envision for few-shot open-set recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7171–7180

2022
[24]

Xuhui Jia, Kai Han, Yukun Zhu, and Bradley Green. 2021. Joint representa- tion learning and novel category discovery on single-and multi-modal data. In Proceedings of the IEEE/CVF international conference on computer vision. 610–619

2021
[25]

Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3d object repre- sentations for fine-grained categorization. InProceedings of the IEEE international conference on computer vision workshops. 554–561

2013
[26]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009)

2009
[27]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning.nature 521, 7553 (2015), 436–444

2015
[28]

Chunming Li, Shidong Wang, and Haofeng Zhang. 2025. Adaptive Gaussian Expansion for On-the-fly Category Discovery. InThe Fourteenth International Conference on Learning Representations

2025
[29]

Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy Hospedales. 2018. Learning to generalize: Meta-learning for domain generalization. InProceedings of the AAAI conference on artificial intelligence, Vol. 32

2018
[30]

Yuelin Li, Elizabeth Schofield, and Mithat Gönen. 2019. A tutorial on Dirichlet process mixture modeling.Journal of mathematical psychology91 (2019), 128– 144

2019
[31]

Xiao Liu, Nan Pu, Haiyang Zheng, Wenjing Li, Nicu Sebe, and Zhun Zhong. 2025. Generate, refine, and encode: Leveraging synthesized novel samples for on-the- fly fine-grained category discovery. InProceedings of the IEEE/CVF International Conference on Computer Vision. 1078–1087

2025
[32]

Yu Liu, Yaqi Cai, Qi Jia, Binglin Qiu, Weimin Wang, and Nan Pu. 2024. Novel class discovery for ultra-fine-grained visual categorization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17679–17688

2024
[33]

Yuanpei Liu, Zhenqi He, and Kai Han. 2025. Hyperbolic category discovery. In Proceedings of the Computer Vision and Pattern Recognition Conference. 9891– 9900

2025
[34]

Yingbing Liu, Fei Ma, Yanan Wu, Xinxin Zuo, Fan Zhang, and Yang Wang. 2025. Collaborative Cloud-edge Generalized Category Discovery. InProceedings of the 33rd ACM International Conference on Multimedia. 535–543

2025
[35]

Shijie Ma, Fei Zhu, Xu-Yao Zhang, and Cheng-Lin Liu. 2025. Protogcd: Uni- fied and unbiased prototype learning for generalized category discovery.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)

2025
[36]

Shijie Ma, Fei Zhu, Zhun Zhong, Wenzhuo Liu, Xu-Yao Zhang, and Cheng-Lin Liu. 2024. Happy: A debiased learning framework for continual generalized category discovery.Advances in Neural Information Processing Systems37 (2024), 50850–50875

2024
[37]

Shijie Ma, Fei Zhu, Zhun Zhong, Xu-Yao Zhang, and Cheng-Lin Liu. 2024. Active generalized category discovery. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16890–16900

2024
[38]

Dimity Miller, Niko Sunderhauf, Michael Milford, and Feras Dayoub. 2021. Class anchor clustering: A loss for distance-based open set recognition. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3570–3578

2021
[39]

Rabah Ouldnoughi, Chia-Wen Kuo, and Zsolt Kira. 2023. Clip-gcd: Simple lan- guage guided generalized category discovery.arXiv preprint arXiv:2305.10420 (2023)

work page arXiv 2023
[40]

Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. 2012. Cats and dogs. In2012 IEEE conference on computer vision and pattern recognition. IEEE, 3498–3505

2012
[41]

Zhengyuan Peng, Jinpeng Ma, Zhimin Sun, Ran Yi, Haichuan Song, Xin Tan, and Lizhuang Ma. 2025. Mos: Modeling object-scene associations in generalized category discovery. InProceedings of the Computer Vision and Pattern Recognition Conference. 15118–15128

2025
[42]

Nan Pu, Wenjing Li, Xingyuan Ji, Yalan Qin, Nicu Sebe, and Zhun Zhong. 2024. Federated generalized category discovery. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. 28741–28750

2024
[43]

Sarah Rastegar, Hazel Doughty, and Cees Snoek. 2023. Learn to categorize or categorize to learn? self-coding for generalized category discovery.Advances in Neural Information Processing Systems36 (2023), 72794–72818

2023
[44]

Sarah Rastegar, Mohammadreza Salehi, Yuki M Asano, Hazel Doughty, and Cees GM Snoek. 2024. Selex: Self-expertise in fine-grained generalized category discovery. InEuropean Conference on Computer Vision. Springer, 440–458

2024
[45]

Vaibhav Rathore, Saikat Dutta, Sarthak Mehrotra, Zsolt Kira, Biplab Banerjee, et al. 2025. When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach. InProceedings of the Computer Vision and Pattern Recognition Conference. 4905–4915

2025
[46]

Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, and Mubarak Shah. 2022. Openldn: Learning to discover novel classes for open-world semi-supervised learning. InEuropean Conference on Computer Vision. Springer, 382–401

2022
[47]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al
[48]

Imagenet large scale visual recognition challenge.International journal of computer vision115, 3 (2015), 211–252

2015
[49]

Wenkai Shi, Wenbin An, Feng Tian, Yan Chen, Yaqiang Wu, Qianying Wang, and Ping Chen. 2024. A unified knowledge transfer network for generalized category discovery. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 18961–18969

2024
[50]

Suvrit Sra. 2012. A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of I s (x).Computational Statistics27, 1 (2012), 177–190

2012
[51]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

2017
[52]

Sagar Vaze, Kai Han, Andrea Vedaldi, and Andrew Zisserman. 2022. Generalized category discovery. InProceedings of the IEEE/CVF conference on computer vision 9 Tang et al. and pattern recognition. 7492–7501

2022
[53]

Sagar Vaze, Andrea Vedaldi, and Andrew Zisserman. 2023. No representation rules them all in category discovery.Advances in Neural Information Processing Systems36 (2023), 19962–19989

2023
[54]

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie
[55]

The caltech-ucsd birds-200-2011 dataset. (2011)

2011
[56]

Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, and Ming- Ming Cheng. 2025. Get: Unlocking the multi-modal potential of clip for gen- eralized category discovery. InProceedings of the Computer Vision and Pattern Recognition Conference. 20296–20306

2025
[57]

Hongjun Wang, Sagar Vaze, and Kai Han. 2024. Hilo: A learning framework for generalized category discovery robust to domain shifts.arXiv preprint arXiv:2408.04591(2024)

work page arXiv 2024
[58]

Hongjun Wang, Sagar Vaze, and Kai Han. 2024. Sptnet: An efficient alternative framework for generalized category discovery with spatial prompt tuning.arXiv preprint arXiv:2403.13684(2024)

work page arXiv 2024
[59]

Xin Wen, Bingchen Zhao, and Xiaojuan Qi. 2023. Parametric classification for generalized category discovery: A baseline study. InProceedings of the IEEE/CVF international conference on computer vision. 16590–16600

2023
[60]

Yanan Wu, Zhixiang Chi, Yang Wang, and Songhe Feng. 2023. Metagcd: Learning to continually learn in generalized category discovery. InProceedings of the IEEE/CVF international conference on computer vision. 1655–1665

2023
[61]

Zelin Zang, Lei Shang, Senqiao Yang, Fei Wang, Baigui Sun, Xuansong Xie, and Stan Z Li. 2023. Boosting novel category discovery over domains with soft contrastive learning and all in one classifier. InProceedings of the IEEE/CVF International Conference on Computer Vision. 11858–11867

2023
[62]

Chuyu Zhang, Ruijie Xu, and Xuming He. 2023. Novel class discovery for long- tailed recognition.arXiv preprint arXiv:2308.02989(2023)

work page arXiv 2023
[63]

Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, and Yifan Xing. 2024. Learning for Transductive Threshold Calibration in Open-World Recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17097–17106

2024
[64]

Wei Zhang, Baopeng Zhang, Zhu Teng, Wenxin Luo, Junnan Zou, and Jianping Fan. 2025. Less attention is more: Prompt transformer for generalized cate- gory discovery. InProceedings of the Computer Vision and Pattern Recognition Conference. 30322–30331

2025
[65]

Bingchen Zhao and Kai Han. 2021. Novel visual category discovery with dual ranking statistics and mutual knowledge distillation.Advances in Neural Infor- mation Processing Systems34 (2021), 22982–22994

2021
[66]

Bingchen Zhao, Nico Lang, Serge Belongie, and Oisin Mac Aodha. 2024. Labeled data selection for category discovery. InEuropean Conference on Computer Vision. Springer, 201–218

2024
[67]

Bingchen Zhao, Xin Wen, and Kai Han. 2023. Learning semi-supervised gaussian mixture models for generalized category discovery. InProceedings of the IEEE/CVF international conference on computer vision. 16623–16633

2023
[68]

Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, and Zhun Zhong. 2024. Proto- typical hash encoding for on-the-fly fine-grained category discovery.Advances in Neural Information Processing Systems37 (2024), 101428–101455

2024
[69]

Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, and Zhun Zhong. 2024. Textual knowledge matters: Cross-modality co-teaching for generalized visual class discovery. InEuropean Conference on Computer Vision. Springer, 41–58

2024
[70]

Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. 2021. Learning placeholders for open-set recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410. 10 PACO: Proxy-Task Alignment and Online Calibration for On-the-Fly Category Discovery Appendix Overview This appendix is organized as follows. Sec. A summarizes t...

2021