pith. machine review for the scientific record. sign in

arxiv: 2605.02580 · v1 · submitted 2026-05-04 · 💻 cs.CV · cs.AI· cs.RO

Recognition: 3 theorem links

· Lean Theorem

Hyp2Former: Hierarchy-Aware Hyperbolic Embeddings for Open-Set Panoptic Segmentation

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:44 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.RO
keywords open-set panoptic segmentationhyperbolic embeddingshierarchical semanticsunknown object detectionsemantic hierarchy
0
0 comments X

The pith

Hyperbolic embeddings of known category hierarchies place unknown objects near parent concepts for reliable open-set detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that open-set panoptic segmentation improves when known thing and stuff classes are embedded in hyperbolic space according to their semantic hierarchy. This structured space keeps fine-grained unknowns close to higher-level abstractions such as animal or object rather than scattering them randomly among unrelated categories. Because the model never sees explicit unknowns during training, detection of novel instances follows directly from their measured proximity to the learned parent concepts. Experiments on COCO, Cityscapes, and Lost&Found show better unknown discovery while preserving accuracy on known classes. The approach therefore replaces flat classification with continuous hierarchical similarity as the basis for separating in-distribution from out-of-distribution pixels.

Core claim

Hyp2Former learns an end-to-end hyperbolic embedding space in which known categories are positioned so that semantic parent-child relations are preserved at multiple levels of abstraction. Unknown objects, never seen in training, nevertheless lie closer to the appropriate higher-level concepts than to unrelated ones, allowing them to be segmented as separate instances without any auxiliary unknown modeling.

What carries the argument

Hierarchy-aware hyperbolic embeddings that encode multi-level semantic similarities among known categories so that proximity to parent concepts serves as the detection signal for unknowns.

If this is right

  • Unknown objects can be segmented as distinct instances solely from their distance to higher-level concepts without retraining or outlier modeling.
  • The same embedding space supports both closed-set accuracy on known classes and open-set discovery on novel ones.
  • Hierarchical structure reduces confusion between semantically distant categories when unknowns appear.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be tested on datasets with explicit multi-level taxonomies to measure how depth of hierarchy affects unknown placement.
  • If hyperbolic geometry is replaced by Euclidean space while keeping the hierarchy loss, performance would likely degrade because flat distances do not naturally respect parent-child nesting.
  • The approach suggests that any open-set task with implicit category structure might benefit from embedding unknowns relative to learned abstractions rather than treating them as pure outliers.

Load-bearing premise

The hierarchy learned only from known classes will automatically locate unseen objects near their correct higher-level parents in the embedding space.

What would settle it

Measure embedding distances: if a large fraction of held-out unknown objects lie farther from their semantic parents than from unrelated known classes, the detection mechanism fails.

Figures

Figures reproduced from arXiv: 2605.02580 by Abhinav Valada, Florian Drews, Rohit Mohan, Yakov Miron, Yao Lu.

Figure 1
Figure 1. Figure 1: Overview of the proposed Hyp2Former. Multi-scale features are extracted from a backbone and are processed by a pixel and transformer decoder. A set of learn￾able queries interacts with the decoder to produce query embeddings, which are fed into the classification head Fcls and mask head Fmask. In parallel, the embeddings are projected into a hyperbolic manifold, where an explicit semantic hierarchy guides … view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative comparison of unknown prediction. view at source ↗
Figure 3
Figure 3. Figure 3: Real-world zero-shot evaluation. Unknown objects are marked in red. Without fine-tuning, Hyp2Former generalizes to unseen real-world scenes and reliably detects unknown objects under distributional shift view at source ↗
Figure 4
Figure 4. Figure 4: Hierarchical distance visualization. Hyperbolic distances between mask embeddings and proxies are shown across hierarchy levels (closest in yellow, farthest in grey). For unknown regions, the nearest leaf proxies remain interpretable, indicating that embeddings follow the learned hierarchy. In Cityscapes [9], pole is defined as background (stuff ) rather than object (thing), although it is countable. Hyper… view at source ↗
Figure 1
Figure 1. Figure 1: Hierarchy definition for Cityscapes [9] with 3 levels (H3). root object ignored human person rider motorcycle bicycle vehicle none-drivable background flat construction street elements truck bus car train vegetation terrain sky road sidewalk building wall fence traffic light traffic sign pole view at source ↗
Figure 2
Figure 2. Figure 2: Hierarchy definition for Cityscapes [9] with 3 levels (H4). root object ignored human person rider motorcycle bicycle vehicle nature none-drivable background flat construction street elements truck bus car train vegetation terrain sky road sidewalk building wall fence traffic light traffic sign pole two-wheeled four-wheeled concrete traffic view at source ↗
Figure 3
Figure 3. Figure 3: Hierarchy definition for Cityscapes [9] with 3 levels (H5) view at source ↗
Figure 4
Figure 4. Figure 4: Hierarchy definition for MS COCO [32] with 4 levels. The classes that are considered unknown are marked in red view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of unknown object prediction. view at source ↗
Figure 6
Figure 6. Figure 6: Open-set Panoptic Segmetation Results. Images are from MS COCO [32]. Annotated unknown objects are outlined in orange and are reflected in the quanti￾tative evaluation, while additional valid but unannotated ones are outlined in yel￾low. Predicted unknown objects are shown in red. Qualitative results demonstrating Hyp2Former’s ability to detect unknown objects while segmenting known classes view at source ↗
read the original abstract

Recognizing unknown objects is crucial for safety-critical applications such as autonomous driving and robotics. Open-Set Panoptic Segmentation (OPS) aims to segment known thing and stuff classes while identifying valid unknown objects as separate instances. Prior OPS approaches largely treat known categories as a flat label set, ignoring the semantic hierarchy that provides valuable structural priors for distinguishing unknown objects from in-distribution classes. In this work, we propose Hyp2Former, an end-to-end framework for OPS that does not require explicit modeling of unknowns during training, and instead learns hierarchical semantic similarities continuously in hyperbolic space. By explicitly encoding hierarchical relationships among known categories, the model learns a structured embedding space that captures multiple levels of semantic abstraction. As a result, unknown objects that cannot be confidently classified as known categories still remain in close proximity to higher-level concepts (e.g., an unknown animal remains closer to "animal" or "object" than to unrelated concepts such as "electronics" or "stuff") and can therefore be reliably detected, even if their fine-grained category was not represented during training. Empirical evaluations across multiple public datasets such as MS COCO, Cityscapes, and Lost&Found demonstrate that Hyp2Former outperforms existing methods on OPS, achieving the best balance between unknown object discovery and in-distribution robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Hyp2Former, an end-to-end framework for open-set panoptic segmentation (OPS) that embeds known thing/stuff categories into hyperbolic space to explicitly encode their semantic hierarchy. Training uses only known-category supervision with hierarchy-aware losses; at inference, unknowns are detected as instances whose embeddings lie closer to higher-level parent concepts than to unrelated known leaves. The authors claim this yields the best trade-off between unknown discovery and in-distribution robustness on MS COCO, Cityscapes, and Lost&Found.

Significance. If the central generalization claim holds, the work offers a principled alternative to outlier-exposure or generative open-set methods by exploiting the tree structure already present in semantic taxonomies. Hyperbolic geometry is a natural fit for hierarchies, and an end-to-end panoptic architecture that avoids explicit unknown modeling during training would be a useful contribution to safety-critical segmentation.

major comments (3)
  1. [§3.2] §3.2 (Hyperbolic Hierarchy Loss): The loss is defined exclusively over pairs of known categories and their ancestors; no term, regularizer, or architectural constraint pulls embeddings of out-of-distribution visual features toward the appropriate higher-level nodes. Consequently the detection rule (proximity to parents) rests on an unverified inductive bias rather than an enforced property.
  2. [§4] §4 (Experiments): No ablation isolates the contribution of the hyperbolic hierarchy loss versus a Euclidean baseline with the same hierarchy loss; no embedding-distance statistics or t-SNE-style visualizations are provided to confirm that unknown instances land nearer to their putative parents than to unrelated concepts. Without these, the central claim cannot be verified from the reported numbers alone.
  3. [§4.3] §4.3 (Lost&Found results): The reported gains in unknown recall are presented without error bars across multiple random seeds or statistical significance tests against the strongest baseline, making it impossible to judge whether the improvement is robust or within the variance of the training protocol.
minor comments (2)
  1. The notation for the hyperbolic operations (exp, log, Möbius addition) is introduced without a short self-contained recap or reference to a standard reference such as Nickel & Kiela (2017), which would aid readers unfamiliar with the manifold.
  2. Figure 3 caption should explicitly state the distance metric used to color the unknown points (e.g., hyperbolic distance to the nearest ancestor).

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for your detailed review and valuable suggestions. We appreciate the opportunity to clarify the design choices and strengthen the experimental validation of Hyp2Former. Below we respond to each major comment.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Hyperbolic Hierarchy Loss): The loss is defined exclusively over pairs of known categories and their ancestors; no term, regularizer, or architectural constraint pulls embeddings of out-of-distribution visual features toward the appropriate higher-level nodes. Consequently the detection rule (proximity to parents) rests on an unverified inductive bias rather than an enforced property.

    Authors: We acknowledge that the hierarchy loss is applied only to known categories during training. The design relies on the properties of hyperbolic space, where the learned hierarchical structure among known classes creates an embedding geometry in which out-of-distribution features, not matching any specific leaf, are positioned closer to higher-level parent nodes. This follows from the tree-like embedding properties of hyperbolic geometry. We will revise §3.2 to provide a more detailed explanation of this inductive bias and its role in OOD detection. revision: partial

  2. Referee: [§4] §4 (Experiments): No ablation isolates the contribution of the hyperbolic hierarchy loss versus a Euclidean baseline with the same hierarchy loss; no embedding-distance statistics or t-SNE-style visualizations are provided to confirm that unknown instances land nearer to their putative parents than to unrelated concepts. Without these, the central claim cannot be verified from the reported numbers alone.

    Authors: We agree that these additional analyses would help verify the central claim. In the revised manuscript, we will add an ablation study comparing Hyp2Former with a Euclidean variant using the same hierarchy loss. We will also include visualizations of the embeddings (such as projections in the Poincaré disk) and quantitative distance statistics demonstrating that unknown instances are closer to their semantic parents than to unrelated known categories. revision: yes

  3. Referee: [§4.3] §4.3 (Lost&Found results): The reported gains in unknown recall are presented without error bars across multiple random seeds or statistical significance tests against the strongest baseline, making it impossible to judge whether the improvement is robust or within the variance of the training protocol.

    Authors: To address this, we will perform additional experiments with multiple random seeds and report the mean and standard deviation for the metrics on the Lost&Found dataset. We will also include statistical significance testing against the baselines to confirm the robustness of the observed improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on generalization in hyperbolic space

full rationale

The paper's core argument is that encoding known-category hierarchies in hyperbolic embeddings causes unknown objects to land near higher-level concepts for detection. This is presented as a consequence of the learned structured space rather than any definitional equivalence or fitted parameter renamed as prediction. No equations, self-citations, or ansatzes are quoted that reduce the unknown-proximity claim to the training inputs by construction. The approach relies on inductive biases of hyperbolic geometry and out-of-distribution generalization, which are external to the derivation and not tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that hyperbolic geometry naturally encodes semantic hierarchies and that proximity in this space suffices for unknown detection; no free parameters or new entities are described in the abstract.

axioms (1)
  • domain assumption Hyperbolic space can represent hierarchical semantic relationships among object categories more effectively than Euclidean space.
    Invoked to justify the embedding choice for capturing multiple levels of abstraction.

pith-pipeline@v0.9.0 · 5544 in / 1149 out tokens · 42398 ms · 2026-05-08T18:44:20.830527+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 6 canonical work pages

  1. [1]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Atigh, M.G., Schoep, J., Acar, E., Van Noord, N., Mettes, P.: Hyperbolic image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4453–4462 (2022)

  2. [2]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Bendale, A., Boult, T.E.: Towards open set deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1563–1572 (2016)

  3. [3]

    Flavors of geometry31(59-115), 2 (1997)

    Cannon, J.W., Floyd, W.J., Kenyon, R., Parry, W.R., et al.: Hyperbolic geometry. Flavors of geometry31(59-115), 2 (1997)

  4. [4]

    SAM 3: Segment Anything with Concepts

    Carion, N., Gustafson, L., Hu, Y.T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala,K.V.,Khedr,H.,Huang,A.,etal.:Sam3:Segmentanythingwithconcepts. arXiv preprint arXiv:2511.16719 (2025)

  5. [5]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Cheng, B., Collins, M.D., Zhu, Y., Liu, T., Huang, T.S., Adam, H., Chen, L.C.: Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic seg- mentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12475–12485 (2020)

  6. [6]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1290–1299 (2022)

  7. [7]

    Advances in neural information processing systems34, 17864–17875 (2021)

    Cheng, B., Schwing, A., Kirillov, A.: Per-pixel classification is not all you need for semantic segmentation. Advances in neural information processing systems34, 17864–17875 (2021)

  8. [8]

    IEEE Transactions on Pattern Analysis and Machine Intelligence (2026)

    Conti, A., Fini, E., Mancini, M., Rota, P., Wang, Y., Ricci, E.: Vocabulary-free image classification and semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2026)

  9. [9]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3213–3223 (2016)

  10. [10]

    In: International Conference on Machine Learning

    Desai, K., Nickel, M., Rajpurohit, T., Johnson, J., Vedantam, S.R.: Hyperbolic image-text representations. In: International Conference on Machine Learning. pp. 7694–7731 (2023)

  11. [11]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Doan, T., Li, X., Behpour, S., He, W., Gou, L., Ren, L.: Hyp-ow: Exploiting hierar- chical structure learning with hyperbolic distance enhances open world object de- tection. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 1555–1563 (2024)

  12. [12]

    In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Ermolov, A., Mirvakhabova, L., Khrulkov, V., Sebe, N., Oseledets, I.: Hyperbolic vision transformers: Combining improvements in metric learning. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 7409–7419 (2022)

  13. [13]

    Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd. vol. 96, pp. 226–231 (1996) 16 Y. Lu et al

  14. [14]

    Advances in neural information processing systems31(2018)

    Ganea, O., Bécigneul, G., Hofmann, T.: Hyperbolic neural networks. Advances in neural information processing systems31(2018)

  15. [15]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Gasperini, S., Marcos-Ramiro, A., Schmidt, M., Navab, N., Busam, B., Tombari, F.: Segmenting known objects and unseen unknowns without prior knowledge. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 19321–19332 (2023)

  16. [16]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Ge, S., Mishra, S., Kornblith, S., Li, C.L., Jacobs, D.: Hyperbolic contrastive learn- ing for visual representations beyond objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6840–6849 (2023)

  17. [17]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Grcić, M., Šarić, J., Šegvić, S.: On advantages of mask-level recognition for outlier- aware segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2937–2947 (2023)

  18. [18]

    He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

  19. [19]

    IEEE Robotics and Automation Letters10(2), 1904–1911 (2024)

    Hindel, J., Cattaneo, D., Valada, A.: Taxonomy-aware continual semantic seg- mentation in hyperbolic spaces for open-world perception. IEEE Robotics and Automation Letters10(2), 1904–1911 (2024)

  20. [20]

    In: International Workshop on Applications of Medical AI

    Hindel, J., Mekic, E., Enamundram, N.K., Mohan, R., Cattaneo, D., Kalweit, M., Valada, A.: Dynamic robot-assisted surgery with hierarchical class-incremental semantic segmentation. In: International Workshop on Applications of Medical AI. pp. 246–257 (2025)

  21. [21]

    In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Hindel, J., Mohan, R., Bratulić, J., Cattaneo, D., Brox, T., Valada, A.: Label- efficient lidar semantic segmentation with 2d-3d vision transformer adapters. In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 4115–4122. IEEE (2025)

  22. [22]

    IEEE Robotics and Automation Letters (2026)

    Hurtado, J.V., Mohan, R., Valada, A.: Hyperspectral adapter for semantic seg- mentation with vision foundation models. IEEE Robotics and Automation Letters (2026)

  23. [23]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Hwang, J., Oh, S.W., Lee, J.Y., Han, B.: Exemplar-based open-set panoptic seg- mentation network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1175–1184 (2021)

  24. [24]

    In: IEEE International conference on robotics and automation (ICRA)

    Käppeler, M., Petek, K., Vödisch, N., Burgard, W., Valada, A.: Few-shot panop- tic segmentation with foundation models. In: IEEE International conference on robotics and automation (ICRA). pp. 7718–7724 (2024)

  25. [25]

    Khrulkov, V., Mirvakhabova, L., Ustinova, E., Oseledets, I., Lempitsky, V.: Hyper- bolicimageembeddings.In:ProceedingsoftheIEEE/CVFconferenceoncomputer vision and pattern recognition. pp. 6418–6428 (2020)

  26. [26]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Kim, S., Jeong, B., Kwak, S.: Hier: Metric learning beyond class labels via hier- archical regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19903–19912 (2023)

  27. [27]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Kim, S., Kim, D., Cho, M., Kwak, S.: Proxy anchor loss for deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3238–3247 (2020)

  28. [28]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

    Kirillov,A., Girshick,R., He,K.,Dollár,P.:Panopticfeaturepyramid networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 6399–6408 (2019)

  29. [29]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

    Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 9404–9413 (2019) Hyp2Former 17

  30. [30]

    In: DAGM German Conference on Pattern Recognition

    Lang, C., Braun, A., Schillingmann, L., Valada, A.: On hyperbolic embeddings in object detection. In: DAGM German Conference on Pattern Recognition. pp. 462–476 (2022)

  31. [31]

    In: International conference on machine learning

    Law, M., Liao, R., Snell, J., Zemel, R.: Lorentzian distance learning for hyperbolic representations. In: International conference on machine learning. pp. 3672–3681 (2019)

  32. [32]

    In: European conference on computer vision

    Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision. pp. 740–755 (2014)

  33. [33]

    In: International Conference on Learning Representations (2017)

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2017)

  34. [34]

    arXiv preprint arXiv:2602.23172 (2026)

    Luz, M., Mohan, R., Nürnberg, T., Miron, Y., Cattaneo, D., Valada, A.: La- tent gaussian splatting for 4d panoptic occupancy tracking. arXiv preprint arXiv:2602.23172 (2026)

  35. [35]

    In: 2024 IEEE International Conference on Robotics and Automation (ICRA)

    Luz, M., Mohan, R., Sekkat, A.R., Sawade, O., Matthes, E., Brox, T., Valada, A.: Amodal optical flow. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 14677–14684. IEEE (2024)

  36. [36]

    International Journal of Computer Vision132(9), 3484–3508 (2024)

    Mettes, P., Ghadimi Atigh, M., Keller-Ressel, M., Gu, J., Yeung, S.: Hyperbolic deep learning in computer vision: A survey. International Journal of Computer Vision132(9), 3484–3508 (2024)

  37. [37]

    arXiv preprint arXiv:2602.08006 (2026)

    Mohan, R., Hurtado, J.V., Mohan, R., Valada, A.: Forecastocc: Vision-based se- mantic occupancy forecasting. arXiv preprint arXiv:2602.08006 (2026)

  38. [38]

    IEEE Robotics and Automation Letters9(8), 7094–7101 (2024)

    Mohan,R.,Arce,J.,Mokhtar,S.,Cattaneo,D.,Valada,A.:Syn-mediverse:amulti- modal synthetic dataset for intelligent scene understanding of healthcare facilities. IEEE Robotics and Automation Letters9(8), 7094–7101 (2024)

  39. [39]

    arXiv preprint arXiv:2602.19349 (2026)

    Mohan, R., Drews, F., Miron, Y., Cattaneo, D., Valada, A.: Up-fuse: Uncertainty- guided lidar-camera fusion for 3d panoptic segmentation. arXiv preprint arXiv:2602.19349 (2026)

  40. [40]

    In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Mohan, R., Hindel, J., Drews, F., Gläser, C., Cattaneo, D., Valada, A.: Open-set lidar panoptic segmentation guided by uncertainty-aware learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 2224–2231 (2025)

  41. [41]

    IEEE Robotics and Automation Letters9(5), 4075–4082 (2024)

    Mohan, R., Kumaraswamy, K., Hurtado, J.V., Petek, K., Valada, A.: Panoptic out-of-distribution segmentation. IEEE Robotics and Automation Letters9(5), 4075–4082 (2024)

  42. [42]

    IEEE Robotics and Automation Letters7(4), 9302–9309 (2022)

    Mohan, R., Valada, A.: Perceiving the invisible: Proposal-free amodal panoptic segmentation. IEEE Robotics and Automation Letters7(4), 9302–9309 (2022)

  43. [43]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Nayal, N., Yavuz, M., Henriques, J.F., Güney, F.: Rba: Segmenting unknown re- gions rejected by all. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 711–722 (2023)

  44. [44]

    Advances in neural information processing systems30(2017)

    Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representa- tions. Advances in neural information processing systems30(2017)

  45. [45]

    In: IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS)

    Pinggera, P., Ramos, S., Gehrig, S., Franke, U., Rother, C., Mester, R.: Lost and found: detecting small road hazards for self-driving vehicles. In: IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS). pp. 1099–1106 (2016)

  46. [46]

    In: International conference on machine learning

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763 (2021) 18 Y. Lu et al

  47. [47]

    IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 9286–9302 (2024)

    Rai, S.N., Cermelli, F., Caputo, B., Masone, C.: Mask2anomaly: Mask transformer for universal open-set segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 9286–9302 (2024)

  48. [48]

    Pattern Recognition Letters (2025)

    Reichard, K., Rizzoli, G., Gasperini, S., Hoyer, L., Zanuttigh, P., Navab, N., Tombari, F.: From open-vocabulary to vocabulary-free semantic segmentation. Pattern Recognition Letters (2025)

  49. [49]

    In: International conference on machine learning

    Sala, F., De Sa, C., Gu, A., Ré, C.: Representation tradeoffs for hyperbolic em- beddings. In: International conference on machine learning. pp. 4460–4469 (2018)

  50. [50]

    In: International symposium on graph drawing

    Sarkar, R.: Low distortion delaunay embedding of trees in hyperbolic plane. In: International symposium on graph drawing. pp. 355–366 (2011)

  51. [51]

    arXiv preprint arXiv:2504.04841 (2025)

    Schmidt, S., Körner, J., Fuchsgruber, D., Gasperini, S., Tombari, F., Günnemann, S.: Prior2former–evidential modeling of mask transformers for assumption-free open-world panoptic segmentation. arXiv preprint arXiv:2504.04841 (2025)

  52. [52]

    Advances in Neural Information Processing Systems 37, 91220–91259 (2024)

    Sinha, A., Zeng, S., Yamada, M., Zhao, H.: Learning structured representations with hyperbolic embeddings. Advances in Neural Information Processing Systems 37, 91220–91259 (2024)

  53. [53]

    Applied Sciences15(16), 9087 (2025)

    Ungur, V., Popa, C.A.: Openmamba: Introducing state space models to open- vocabulary semantic segmentation. Applied Sciences15(16), 9087 (2025)

  54. [54]

    arXiv preprint arXiv:2303.10147 (2023)

    Vödisch, N., Petek, K., Burgard, W., Valada, A.: Codeps: Online continual learning for depth estimation and panoptic segmentation. arXiv preprint arXiv:2303.10147 (2023)

  55. [55]

    IEEE Robotics and Automation Letters10(1), 216–223 (2024)

    Vödisch, N., Petek, K., Käppeler, M., Valada, A., Burgard, W.: A good foundation is worth many labels: Label-efficient panoptic segmentation. IEEE Robotics and Automation Letters10(1), 216–223 (2024)

  56. [56]

    In: Conference on Robot Learning

    Wong, K., Wang, S., Ren, M., Liang, M., Urtasun, R.: Identifying unknown in- stances for autonomous driving. In: Conference on Robot Learning. pp. 384–393 (2020)

  57. [57]

    In: British Machine Vision Conference (2022)

    Xu, H., Chen, H., Liu, L., Yin, Y.: Dual decision improves open-set panoptic seg- mentation. In: British Machine Vision Conference (2022)

  58. [58]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Yin, Y., Chen, H., Zhou, W., Deng, J., Xu, H., Li, H.: Revisiting open-set panoptic segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 6747–6754 (2024) Supplementary Material Hyp2Former: Hierarchy-Aware Hyperbolic Embeddings for Open-Set Panoptic Segmentation Yao Lu1, Rohit Mohan1, Florian Drews2, Yakov Miron2, a...