Recognition: unknown
PACO: Proxy-Task Alignment and Online Calibration for On-the-Fly Category Discovery
Pith reviewed 2026-05-10 16:44 UTC · model grok-4.3
The pith
A tree-structured decision process with proxy-initialized and online-updated thresholds improves stability in on-the-fly category discovery.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OCD is a dynamic process requiring continuous decisions on known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory. By calibrating thresholds offline through proxy discovery simulation to align with inference needs and then updating them online from mature novel prototypes, the resulting tree-structured framework produces stable category formation without heavy retraining or dataset-specific tuning.
What carries the argument
The support-set-calibrated tree-structured online decision framework that sequences known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory, with thresholds initialized by proxy simulation and updated from mature novel prototypes.
If this is right
- Existing OCD pipelines gain an inference-time module that improves known and novel class handling without retraining the underlying representation.
- Thresholds adapt continuously during inference, reducing the inconsistency that arises from static boundaries.
- No dataset-specific tuning is required, so the same framework can be deployed across different streaming benchmarks.
- Dynamic prototype memory supports attach-versus-create decisions that keep category formation coherent as new samples arrive.
Where Pith is reading between the lines
- The same proxy-simulation plus online-update pattern could be tested in other streaming recognition settings where decision boundaries must evolve without full retraining.
- If the attach-versus-create logic generalizes, it might reduce the size of the initial support set needed for reliable open-world performance.
- Longer real-world video streams with many novel classes arriving at irregular intervals would provide a direct test of whether mature-prototype updates prevent drift.
Load-bearing premise
Thresholds calibrated offline by simulating the proxy discovery process will align with the changing needs of real-time inference and produce stable categories when they are updated from mature novel prototypes without any dataset-specific adjustments.
What would settle it
Apply the framework to a long streaming sequence containing gradually introduced novel classes and measure whether the number and purity of formed categories remain consistent over time; if performance falls to the level of fixed-threshold baselines or if clusters fragment, the claim would be refuted.
Figures
read the original abstract
On-the-Fly Category Discovery (OCD) requires a model, trained on an offline support set, to recognize known classes while discovering new ones from an online streaming sequence. Existing methods focus heavily on offline training. They aim to learn discriminative representations on the support set so that novel classes can be separated at test time. However, their discovery mechanism at inference is typically reduced to a single threshold. We argue that this paradigm is fundamentally flawed as OCD is not a static classification problem, but a dynamic process. The model must continuously decide 1) whether a sample belongs to a known class, 2) matches an existing novel category, or 3) should initiate a new one. Moreover, prior methods treat the support set as fixed knowledge. They do not update their decision boundaries as new evidence arrives during inference. This leads to unstable and inconsistent category formation. Our experiments confirm these issues. With properly calibrated and adaptive thresholds, substantial improvements can be achieved, even without changing the representation. Motivated by this, we propose PACO, a support-set-calibrated, tree-structured online decision framework. The framework models inference as a sequence of hierarchical decisions, including known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory. Furthermore, we simulate the proxy discovery process to initialize the thresholds during offline training to align with inference. Thresholds are continuously updated during inference using mature novel prototypes. Importantly, PACO requires no heavy training and no dataset-specific tuning. It can be directly integrated into existing OCD pipelines as an inference-time module. Extensive experiments show significant improvements over SOTA baselines across seven benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that single-threshold inference in on-the-fly category discovery (OCD) is fundamentally flawed for the dynamic, multi-way decisions required (known-class routing, matching existing novel categories, or creating new ones) and that prior methods fail to update boundaries as evidence arrives during streaming inference. It proposes PACO, a support-set-calibrated tree-structured online decision framework using dynamic prototype memory, with thresholds initialized offline by simulating the proxy discovery process and continuously updated online from mature novel prototypes. The method is presented as a lightweight inference-time module integrable into existing OCD pipelines without heavy retraining or dataset-specific tuning, and experiments across seven benchmarks are said to show significant improvements over SOTA baselines.
Significance. If the reported gains prove robust, this could meaningfully advance OCD research by redirecting attention from representation learning alone to inference-time hierarchical calibration and adaptive thresholds. The practical framing as a plug-in module that improves stability without retraining is a clear strength, and the proxy-simulation idea for threshold alignment offers a plausible way to bridge offline training and online streaming if the alignment holds empirically.
major comments (2)
- §3 (method description): The central claim that offline simulation of proxy discovery produces thresholds aligned with online inference needs explicit validation; without an ablation comparing simulated initialization against fixed or random thresholds (and reporting the resulting impact on category stability and accuracy), the assertion of no dataset-specific tuning remains untested and load-bearing for the no-tuning guarantee.
- §4 (experiments): The abstract and results claim substantial improvements even without representation changes, yet no quantitative deltas, baseline details, error bars, or statistical significance tests are referenced for the seven benchmarks; this undermines assessment of whether the hierarchical decisions and online updates are the true source of gains versus variance or implementation details.
minor comments (3)
- The abstract would be strengthened by including at least one concrete performance number or benchmark name to ground the 'significant improvements' statement.
- Introduce formal notation or pseudocode for the attach-versus-create decision rule and the maturity criterion for prototype updates earlier in the method section to improve reproducibility.
- Ensure consistent use of terms such as 'mature novel prototypes' with a precise definition (e.g., sample count or confidence threshold) to avoid ambiguity in the online update procedure.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below and agree that targeted additions will strengthen the manuscript. We will revise accordingly.
read point-by-point responses
-
Referee: §3 (method description): The central claim that offline simulation of proxy discovery produces thresholds aligned with online inference needs explicit validation; without an ablation comparing simulated initialization against fixed or random thresholds (and reporting the resulting impact on category stability and accuracy), the assertion of no dataset-specific tuning remains untested and load-bearing for the no-tuning guarantee.
Authors: We agree that an explicit ablation would provide stronger empirical support for the alignment between the offline proxy simulation and online inference. In the revised manuscript, we will add a dedicated ablation study comparing the simulated threshold initialization against fixed and random alternatives. This study will quantify effects on category stability (measured by consistency of novel category assignments across streaming sequences) and discovery accuracy across the benchmarks, directly testing the no dataset-specific tuning property. revision: yes
-
Referee: §4 (experiments): The abstract and results claim substantial improvements even without representation changes, yet no quantitative deltas, baseline details, error bars, or statistical significance tests are referenced for the seven benchmarks; this undermines assessment of whether the hierarchical decisions and online updates are the true source of gains versus variance or implementation details.
Authors: We concur that more granular quantitative reporting is necessary to substantiate the claims. In the revision, we will expand the experimental section with full tables reporting per-benchmark deltas versus each baseline, complete implementation details for all compared methods, error bars computed over multiple random seeds, and statistical significance tests (e.g., paired t-tests with p-values) to isolate the contribution of the hierarchical decision tree and online prototype updates from other sources of variation. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents PACO as an inference-time module that performs hierarchical decisions over a dynamic prototype memory, with thresholds initialized by simulating the proxy discovery process offline and then updated online from mature prototypes. No equations, derivations, or self-citations are shown that reduce the claimed performance gains or category-formation stability to quantities defined by the inputs themselves. The central argument rests on the described mechanisms (known-class routing, birth-aware assignment, attach-vs-create) rather than any self-definitional loop, fitted-input-as-prediction, or load-bearing self-citation. This matches the provided reader's assessment that no self-referential reduction exists.
Axiom & Free-Parameter Ledger
free parameters (1)
- thresholds
axioms (2)
- domain assumption OCD inference requires continuous hierarchical decisions among known class, match to existing novel category, or creation of new category
- domain assumption Support set provides sufficient calibration signal for inference-time decisions without heavy retraining
invented entities (2)
-
dynamic prototype memory
no independent evidence
-
tree-structured online decision framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Anwesha Banerjee and Soma Biswas. 2025. Language-assisted Feature Repre- sentation and Lightweight Active Learning For On-the-Fly Category Discovery. Transactions on Machine Learning Research(2025)
2025
-
[2]
Arindam Banerjee, Inderjit S Dhillon, Joydeep Ghosh, Suvrit Sra, and Greg Ridgeway. 2005. Clustering on the Unit Hypersphere using von Mises-Fisher Distributions.Journal of Machine Learning Research6, 9 (2005)
2005
-
[3]
Abhijit Bendale and Terrance E Boult. 2016. Towards open set deep networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 1563–1572
2016
-
[4]
Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101–mining discriminative components with random forests. InEuropean conference on com- puter vision. Springer, 446–461
2014
-
[5]
Feng Cao, Martin Estert, Weining Qian, and Aoying Zhou. 2006. Density-based clustering over an evolving data stream with noise. InProceedings of the 2006 SIAM international conference on data mining. SIAM, 328–339
2006
- [6]
-
[7]
Xinzi Cao, Ke Chen, Feidiao Yang, Xiawu Zheng, Yonghong Tian, and Yutong Lu. 2025. AllGCD: Leveraging All Unlabeled Data for Generalized Category Discovery. InProceedings of the IEEE/CVF International Conference on Computer Vision. 3293–3303
2025
-
[8]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision. 9650–9660
2021
-
[9]
Sua Choi, Dahyun Kang, and Minsu Cho. 2024. Contrastive mean-shift learning for generalized category discovery. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 23094–23104
2024
-
[10]
Steve Cruz, Ryan Rabinowitz, Manuel Günther, and Terrance E Boult. 2024. Op- erational open-set recognition and postmax refinement. InEuropean Conference on Computer Vision. Springer, 475–492
2024
-
[11]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4690–4699
2019
-
[12]
Alexey Dosovitskiy. 2020. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929(2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[13]
Ruoyi Du, Dongliang Chang, Kongming Liang, Timothy Hospedales, Yi-Zhe Song, and Zhanyu Ma. 2023. On-the-fly category discovery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11691–11700
2023
-
[14]
Michael Hahsler and Matthew Bolaños. 2016. Clustering data streams based on shared density between micro-clusters.IEEE transactions on knowledge and data engineering28, 6 (2016), 1449–1461
2016
-
[15]
Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, and An- drew Zisserman. 2021. Autonovel: Automatically discovering and learning novel visual categories.IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2021), 6767–6781
2021
-
[16]
Kai Han, Andrea Vedaldi, and Andrew Zisserman. 2019. Learning to discover novel visual categories via deep transfer clustering. InProceedings of the IEEE/CVF international conference on computer vision. 8401–8409
2019
-
[17]
1975.Clustering algorithms
John A Hartigan. 1975.Clustering algorithms. John Wiley & Sons, Inc
1975
-
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778
2016
-
[19]
Francisco Herrera, Francisco Charte, Antonio J Rivera, and María J Del Jesus
-
[20]
InMultilabel Classification: Problem Analysis, Metrics and Techniques
Multilabel classification. InMultilabel Classification: Problem Analysis, Metrics and Techniques. Springer, 17–31
-
[21]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger
-
[22]
InProceedings of the IEEE conference on computer vision and pattern recognition
Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708
-
[23]
Shiyuan Huang, Jiawei Ma, Guangxing Han, and Shih-Fu Chang. 2022. Task- adaptive negative envision for few-shot open-set recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7171–7180
2022
-
[24]
Xuhui Jia, Kai Han, Yukun Zhu, and Bradley Green. 2021. Joint representa- tion learning and novel category discovery on single-and multi-modal data. In Proceedings of the IEEE/CVF international conference on computer vision. 610–619
2021
-
[25]
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3d object repre- sentations for fine-grained categorization. InProceedings of the IEEE international conference on computer vision workshops. 554–561
2013
-
[26]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009)
2009
-
[27]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning.nature 521, 7553 (2015), 436–444
2015
-
[28]
Chunming Li, Shidong Wang, and Haofeng Zhang. 2025. Adaptive Gaussian Expansion for On-the-fly Category Discovery. InThe Fourteenth International Conference on Learning Representations
2025
-
[29]
Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy Hospedales. 2018. Learning to generalize: Meta-learning for domain generalization. InProceedings of the AAAI conference on artificial intelligence, Vol. 32
2018
-
[30]
Yuelin Li, Elizabeth Schofield, and Mithat Gönen. 2019. A tutorial on Dirichlet process mixture modeling.Journal of mathematical psychology91 (2019), 128– 144
2019
-
[31]
Xiao Liu, Nan Pu, Haiyang Zheng, Wenjing Li, Nicu Sebe, and Zhun Zhong. 2025. Generate, refine, and encode: Leveraging synthesized novel samples for on-the- fly fine-grained category discovery. InProceedings of the IEEE/CVF International Conference on Computer Vision. 1078–1087
2025
-
[32]
Yu Liu, Yaqi Cai, Qi Jia, Binglin Qiu, Weimin Wang, and Nan Pu. 2024. Novel class discovery for ultra-fine-grained visual categorization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17679–17688
2024
-
[33]
Yuanpei Liu, Zhenqi He, and Kai Han. 2025. Hyperbolic category discovery. In Proceedings of the Computer Vision and Pattern Recognition Conference. 9891– 9900
2025
-
[34]
Yingbing Liu, Fei Ma, Yanan Wu, Xinxin Zuo, Fan Zhang, and Yang Wang. 2025. Collaborative Cloud-edge Generalized Category Discovery. InProceedings of the 33rd ACM International Conference on Multimedia. 535–543
2025
-
[35]
Shijie Ma, Fei Zhu, Xu-Yao Zhang, and Cheng-Lin Liu. 2025. Protogcd: Uni- fied and unbiased prototype learning for generalized category discovery.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)
2025
-
[36]
Shijie Ma, Fei Zhu, Zhun Zhong, Wenzhuo Liu, Xu-Yao Zhang, and Cheng-Lin Liu. 2024. Happy: A debiased learning framework for continual generalized category discovery.Advances in Neural Information Processing Systems37 (2024), 50850–50875
2024
-
[37]
Shijie Ma, Fei Zhu, Zhun Zhong, Xu-Yao Zhang, and Cheng-Lin Liu. 2024. Active generalized category discovery. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16890–16900
2024
-
[38]
Dimity Miller, Niko Sunderhauf, Michael Milford, and Feras Dayoub. 2021. Class anchor clustering: A loss for distance-based open set recognition. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3570–3578
2021
- [39]
-
[40]
Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. 2012. Cats and dogs. In2012 IEEE conference on computer vision and pattern recognition. IEEE, 3498–3505
2012
-
[41]
Zhengyuan Peng, Jinpeng Ma, Zhimin Sun, Ran Yi, Haichuan Song, Xin Tan, and Lizhuang Ma. 2025. Mos: Modeling object-scene associations in generalized category discovery. InProceedings of the Computer Vision and Pattern Recognition Conference. 15118–15128
2025
-
[42]
Nan Pu, Wenjing Li, Xingyuan Ji, Yalan Qin, Nicu Sebe, and Zhun Zhong. 2024. Federated generalized category discovery. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. 28741–28750
2024
-
[43]
Sarah Rastegar, Hazel Doughty, and Cees Snoek. 2023. Learn to categorize or categorize to learn? self-coding for generalized category discovery.Advances in Neural Information Processing Systems36 (2023), 72794–72818
2023
-
[44]
Sarah Rastegar, Mohammadreza Salehi, Yuki M Asano, Hazel Doughty, and Cees GM Snoek. 2024. Selex: Self-expertise in fine-grained generalized category discovery. InEuropean Conference on Computer Vision. Springer, 440–458
2024
-
[45]
Vaibhav Rathore, Saikat Dutta, Sarthak Mehrotra, Zsolt Kira, Biplab Banerjee, et al. 2025. When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach. InProceedings of the Computer Vision and Pattern Recognition Conference. 4905–4915
2025
-
[46]
Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, and Mubarak Shah. 2022. Openldn: Learning to discover novel classes for open-world semi-supervised learning. InEuropean Conference on Computer Vision. Springer, 382–401
2022
-
[47]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al
-
[48]
Imagenet large scale visual recognition challenge.International journal of computer vision115, 3 (2015), 211–252
2015
-
[49]
Wenkai Shi, Wenbin An, Feng Tian, Yan Chen, Yaqiang Wu, Qianying Wang, and Ping Chen. 2024. A unified knowledge transfer network for generalized category discovery. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 18961–18969
2024
-
[50]
Suvrit Sra. 2012. A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of I s (x).Computational Statistics27, 1 (2012), 177–190
2012
-
[51]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)
2017
-
[52]
Sagar Vaze, Kai Han, Andrea Vedaldi, and Andrew Zisserman. 2022. Generalized category discovery. InProceedings of the IEEE/CVF conference on computer vision 9 Tang et al. and pattern recognition. 7492–7501
2022
-
[53]
Sagar Vaze, Andrea Vedaldi, and Andrew Zisserman. 2023. No representation rules them all in category discovery.Advances in Neural Information Processing Systems36 (2023), 19962–19989
2023
-
[54]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie
-
[55]
The caltech-ucsd birds-200-2011 dataset. (2011)
2011
-
[56]
Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, and Ming- Ming Cheng. 2025. Get: Unlocking the multi-modal potential of clip for gen- eralized category discovery. InProceedings of the Computer Vision and Pattern Recognition Conference. 20296–20306
2025
- [57]
- [58]
-
[59]
Xin Wen, Bingchen Zhao, and Xiaojuan Qi. 2023. Parametric classification for generalized category discovery: A baseline study. InProceedings of the IEEE/CVF international conference on computer vision. 16590–16600
2023
-
[60]
Yanan Wu, Zhixiang Chi, Yang Wang, and Songhe Feng. 2023. Metagcd: Learning to continually learn in generalized category discovery. InProceedings of the IEEE/CVF international conference on computer vision. 1655–1665
2023
-
[61]
Zelin Zang, Lei Shang, Senqiao Yang, Fei Wang, Baigui Sun, Xuansong Xie, and Stan Z Li. 2023. Boosting novel category discovery over domains with soft contrastive learning and all in one classifier. InProceedings of the IEEE/CVF International Conference on Computer Vision. 11858–11867
2023
- [62]
-
[63]
Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, and Yifan Xing. 2024. Learning for Transductive Threshold Calibration in Open-World Recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17097–17106
2024
-
[64]
Wei Zhang, Baopeng Zhang, Zhu Teng, Wenxin Luo, Junnan Zou, and Jianping Fan. 2025. Less attention is more: Prompt transformer for generalized cate- gory discovery. InProceedings of the Computer Vision and Pattern Recognition Conference. 30322–30331
2025
-
[65]
Bingchen Zhao and Kai Han. 2021. Novel visual category discovery with dual ranking statistics and mutual knowledge distillation.Advances in Neural Infor- mation Processing Systems34 (2021), 22982–22994
2021
-
[66]
Bingchen Zhao, Nico Lang, Serge Belongie, and Oisin Mac Aodha. 2024. Labeled data selection for category discovery. InEuropean Conference on Computer Vision. Springer, 201–218
2024
-
[67]
Bingchen Zhao, Xin Wen, and Kai Han. 2023. Learning semi-supervised gaussian mixture models for generalized category discovery. InProceedings of the IEEE/CVF international conference on computer vision. 16623–16633
2023
-
[68]
Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, and Zhun Zhong. 2024. Proto- typical hash encoding for on-the-fly fine-grained category discovery.Advances in Neural Information Processing Systems37 (2024), 101428–101455
2024
-
[69]
Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, and Zhun Zhong. 2024. Textual knowledge matters: Cross-modality co-teaching for generalized visual class discovery. InEuropean Conference on Computer Vision. Springer, 41–58
2024
-
[70]
Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. 2021. Learning placeholders for open-set recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410. 10 PACO: Proxy-Task Alignment and Online Calibration for On-the-Fly Category Discovery Appendix Overview This appendix is organized as follows. Sec. A summarizes t...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.