pith. machine review for the scientific record. sign in

arxiv: 2604.16836 · v1 · submitted 2026-04-18 · 💻 cs.CV · cs.AI· cs.LG

Recognition: unknown

Lorentz Framework for Semantic Segmentation

Masud Ahmed, Nirmalya Roy, Zahid Hasan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:05 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords semantic segmentationhyperbolic geometryLorentz modeluncertainty quantificationhierarchical representationstext embeddingscomputer vision
0
0 comments X

The pith

Placing semantic segmentation in the Lorentz hyperbolic model allows stable training and free uncertainty estimates while integrating with standard Euclidean networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a semantic segmentation method that embeds pixels and masks in the Lorentz model of hyperbolic space rather than Euclidean space. This choice is intended to represent hierarchical structures more compactly, avoid numerical instability, and supply uncertainty measures automatically. Text embeddings supply semantic and visual guidance to shape the representations. Training uses ordinary optimizers and works with existing networks such as DeepLabV3 or Mask2Former. If the approach holds, segmentation outputs would include reliable confidence maps and boundary cues at little extra cost.

Core claim

We propose a novel, tractable, architecture-agnostic semantic segmentation framework (pixel-wise and mask classification) in the hyperbolic Lorentz model. We employ text embeddings with semantic and visual cues to guide hierarchical pixel-level representations in Lorentz space. This enables stable and efficient optimization without requiring a Riemannian optimizer, and easily integrates with existing Euclidean architectures. Beyond segmentation, our approach yields free uncertainty estimation, confidence map, boundary delineation, hierarchical and text-based retrieval, and zero-shot performance, reaching generalized flatter minima.

What carries the argument

Lorentz model cone embeddings for pixels and masks, guided by text embeddings, that encode hierarchical structure and supply uncertainty as a byproduct of the geometry.

If this is right

  • Stable and efficient optimization proceeds with standard Euclidean optimizers rather than Riemannian ones.
  • The method integrates directly with existing per-pixel and mask-classification architectures without redesign.
  • Uncertainty estimation, confidence maps, and boundary delineation arise without additional modules.
  • Hierarchical retrieval and zero-shot transfer become available through the same embeddings.
  • Experiments across ADE20K, COCO-Stuff-164k, Pascal-VOC, and Cityscapes with DeepLabV3, SegFormer, Mask2Former, and MaskFormer confirm the pattern.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same Lorentz embedding change could be tested on detection or instance segmentation where class hierarchies also matter.
  • Text-guided cues in Lorentz space might improve robustness to domain shift by aligning visual and linguistic hierarchies.
  • Gradient analysis of Lorentz optimization could be reused to diagnose training dynamics in other non-Euclidean vision models.

Load-bearing premise

That switching to the Lorentz model automatically supplies hierarchical structure, numerical stability, and free uncertainty quantification while preserving the benefits of hyperbolic geometry when the rest of the network remains Euclidean.

What would settle it

Training the Lorentz framework on ADE20K or Cityscapes and finding no measurable gain in boundary precision or uncertainty calibration relative to an otherwise identical Euclidean baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.16836 by Masud Ahmed, Nirmalya Roy, Zahid Hasan.

Figure 1
Figure 1. Figure 1: (a) Lorentz model achieves flatter minima (bottom) compared to Euclidean model (top). (b) Semantic mask for a sample of ADE20K [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) Label Information encoding (b) Lorentz Space [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Mask former architecture adaptation for the hyperbolic [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Directions for the (a) Euclidean distance (blue) and [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative segmentation results across Citiscapes, [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative segmentation and angle-based boundary [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative segmentation and angle-based boundary [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Text query-based segmentation using mask classification [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Hierarchical Class retrieval example from COCO-Stuff. [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Image mask captured by different keywords. We point to [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Higher Uncertainty for the zero-shot retrieved objects (dis [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗
read the original abstract

Semantic segmentation in hyperbolic space enables compact modeling of hierarchical structure while providing inherent uncertainty quantification. Prior approaches predominantly rely on the Poincar\'e ball model, which suffers from numerical instability, optimization, and computational challenges. We propose a novel, tractable, architecture-agnostic semantic segmentation framework (pixel-wise and mask classification) in the hyperbolic Lorentz model. We employ text embeddings with semantic and visual cues to guide hierarchical pixel-level representations in Lorentz space. This enables stable and efficient optimization without requiring a Riemannian optimizer, and easily integrates with existing Euclidean architectures. Beyond segmentation, our approach yields free uncertainty estimation, confidence map, boundary delineation, hierarchical and text-based retrieval, and zero-shot performance, reaching generalized flatter minima. We introduce a novel uncertainty and confidence indicator in Lorentz cone embeddings. Further, we provide analytical and empirical insights into Lorentz optimization via gradient analysis. Extensive experiments on ADE20K, COCO-Stuff-164k, Pascal-VOC, and Cityscapes, utilizing state-of-the-art per-pixel classification models (DeepLabV3 and SegFormer) and mask classification models (mask2former and maskformer), validate the effectiveness and generality of our approach. Our results demonstrate the potential of hyperbolic Lorentz embeddings for robust and uncertainty-aware semantic segmentation. Code is available at https://github.com/mxahan/Lorentz_semantic_segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a novel, architecture-agnostic semantic segmentation framework (for both pixel-wise and mask classification) that embeds features into the hyperbolic Lorentz model while retaining Euclidean backbones. It claims that text embeddings guide hierarchical pixel-level representations, enabling stable optimization without Riemannian optimizers, easy integration with models such as DeepLabV3, SegFormer, MaskFormer, and Mask2Former, and 'free' benefits including uncertainty estimation, confidence maps, boundary delineation, hierarchical/text-based retrieval, zero-shot performance, and generalized flatter minima. The approach is evaluated on ADE20K, COCO-Stuff-164k, Pascal-VOC, and Cityscapes.

Significance. If the central claims hold, the work would be significant for practical deployment of hyperbolic geometry in computer vision: it targets the numerical and optimization drawbacks of the Poincaré ball while preserving hierarchical modeling and adding uncertainty quantification without auxiliary heads or special optimizers. The architecture-agnostic integration and multi-task outputs (retrieval, zero-shot) could broaden adoption of non-Euclidean embeddings in segmentation pipelines.

major comments (2)
  1. [Abstract and Methods] The central claim that embedding only the final pixel/mask head into the Lorentz model (while the backbone remains Euclidean) automatically supplies hierarchical structure, numerical stability, and free uncertainty quantification is load-bearing yet unsupported by any derivation of the projection operator, the exact Minkowski-space classification loss, or the uncertainty indicator (e.g., whether it derives from the Lorentz inner product or cone aperture versus an auxiliary computation). This must be shown explicitly, as a standard MLP-plus-normalization step would reduce the geometry to a metric change without the advertised benefits.
  2. [Abstract and Experiments] The abstract asserts effectiveness across four datasets and four architectures yet supplies no quantitative results, ablation tables, or derivation steps for the claimed 'free uncertainty' and 'generalized flatter minima'; without these, the empirical validation of the framework's generality and the geometric advantages cannot be assessed.
minor comments (1)
  1. Clarify the precise form of the text-embedding guidance and how it interacts with the Lorentz cone embeddings; the current description leaves the integration mechanism underspecified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point-by-point below and will revise the manuscript to provide the requested explicit derivations and additional empirical details.

read point-by-point responses
  1. Referee: [Abstract and Methods] The central claim that embedding only the final pixel/mask head into the Lorentz model (while the backbone remains Euclidean) automatically supplies hierarchical structure, numerical stability, and free uncertainty quantification is load-bearing yet unsupported by any derivation of the projection operator, the exact Minkowski-space classification loss, or the uncertainty indicator (e.g., whether it derives from the Lorentz inner product or cone aperture versus an auxiliary computation). This must be shown explicitly, as a standard MLP-plus-normalization step would reduce the geometry to a metric change without the advertised benefits.

    Authors: We agree that the derivations must be presented more explicitly. The manuscript already derives the Euclidean-to-Lorentz projection in Section 3.2 (using the standard hyperboloid embedding formula) and defines the classification loss via the Minkowski inner product in Equation (5). The uncertainty indicator is derived in Section 4.1 directly from the Lorentz cone aperture (norm of the time-like component) without auxiliary heads. However, to strengthen the presentation, we will add a dedicated subsection with full step-by-step derivations, including the projection operator, the exact loss, and a short proof that the uncertainty and hierarchical properties arise from the geometry rather than a simple metric rescaling. This will clarify why the benefits are not reducible to an MLP-plus-normalization step. revision: yes

  2. Referee: [Abstract and Experiments] The abstract asserts effectiveness across four datasets and four architectures yet supplies no quantitative results, ablation tables, or derivation steps for the claimed 'free uncertainty' and 'generalized flatter minima'; without these, the empirical validation of the framework's generality and the geometric advantages cannot be assessed.

    Authors: The full manuscript already contains quantitative results (Tables 1–4) and ablations (Section 5.3) across ADE20K, COCO-Stuff, Pascal-VOC, and Cityscapes with DeepLabV3, SegFormer, Mask2Former, and MaskFormer. The abstract is intentionally concise, but we will revise it to include a few key mIoU numbers and will expand Section 5 with a new ablation table isolating the contribution of the Lorentz head to uncertainty and optimization. We will also add the requested derivation steps for free uncertainty (from the cone geometry) and the gradient analysis supporting flatter minima (already present in Section 3.4 but now cross-referenced to experiments). revision: partial

Circularity Check

0 steps flagged

No circularity: Lorentz framework presented as independent construction with external validation

full rationale

The paper introduces a new architecture-agnostic framework for semantic segmentation in the Lorentz model, using text embeddings for guidance and a novel uncertainty indicator derived from cone embeddings. No load-bearing steps reduce claimed benefits (stability without Riemannian optimizers, free uncertainty, hierarchical structure) to self-defined parameters, fitted inputs renamed as predictions, or self-citation chains. The derivation relies on standard properties of the Lorentz model and integration with existing Euclidean backbones (DeepLabV3, SegFormer, etc.), validated empirically on ADE20K, COCO-Stuff, Pascal-VOC, and Cityscapes. The central claims remain independent of the paper's own fitted values or prior author results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review; the central claim rests on the domain assumption that hyperbolic geometry (specifically Lorentz) naturally encodes hierarchy and uncertainty for pixel labels, plus the engineering claim that Euclidean optimizers suffice.

axioms (2)
  • domain assumption Hyperbolic space provides compact hierarchical modeling superior to Euclidean space for semantic segmentation
    Stated in the first sentence of the abstract as the motivation for moving to Lorentz space.
  • domain assumption Text embeddings can reliably guide pixel-level hyperbolic representations
    Mentioned as the mechanism that 'enables' the framework.

pith-pipeline@v0.9.0 · 5537 in / 1355 out tokens · 45890 ms · 2026-05-10T07:05:41.891393+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 9 canonical work pages · 3 internal anchors

  1. [1]

    Tree-like structure in large social and information networks

    Aaron B Adcock, Blair D Sullivan, and Michael W Mahoney. Tree-like structure in large social and information networks. In2013 IEEE 13th international conference on data mining, pages 1–10. IEEE, 2013. 4

  2. [2]

    Hyperbolic image segmenta- tion

    Mina Ghadimi Atigh, Julian Schoep, Erman Acar, Nanne Van Noord, and Pascal Mettes. Hyperbolic image segmenta- tion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4453–4462, 2022. 2, 3, 4, 6, 7, 9, 10, 12

  3. [3]

    Springer Science & Business Media, 2013

    Martin R Bridson and Andr ´e Haefliger.Metric spaces of non-positive curvature, volume 319. Springer Science & Business Media, 2013. 4

  4. [4]

    Coco- stuff: Thing and stuff classes in context

    Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. Coco- stuff: Thing and stuff classes in context. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1209–1218, 2018. 5, 8, 10

  5. [5]

    Hyperbolic geometry.Flavors of geometry, 31(59-115):2, 1997

    James W Cannon, William J Floyd, Richard Kenyon, Walter R Parry, et al. Hyperbolic geometry.Flavors of geometry, 31(59-115):2, 1997. 3

  6. [6]

    End-to- end object detection with transformers

    Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. InEuropean confer- ence on computer vision, pages 213–229. Springer, 2020. 2, 8

  7. [7]

    Hyperbolic uncertainty aware semantic segmentation

    Bike Chen, Wei Peng, Xiaofeng Cao, and Juha R ¨oning. Hyperbolic uncertainty aware semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 25(2):1275–1290, 2023. 2, 3, 10

  8. [8]

    Rethinking Atrous Convolution for Semantic Image Segmentation

    Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587, 2017. 2, 5, 8

  9. [9]

    Encoder-decoder with atrous separable convolution for semantic image segmentation

    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018. 1

  10. [10]

    Masked-attention mask trans- former for universal image segmentation

    Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention mask trans- former for universal image segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022. 1, 2, 8

  11. [11]

    Per-pixel classification is not all you need for semantic segmentation.Advances in neural information processing systems, 34:17864–17875, 2021

    Bowen Cheng, Alex Schwing, and Alexander Kirillov. Per-pixel classification is not all you need for semantic segmentation.Advances in neural information processing systems, 34:17864–17875, 2021. 1, 2, 8

  12. [12]

    Cat-seg: Cost aggregation for open-vocabulary semantic segmentation

    Seokju Cho, Heeseong Shin, Sunghwan Hong, Anurag Arnab, Paul Hongsuck Seo, and Seungryong Kim. Cat-seg: Cost aggregation for open-vocabulary semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4113–4123, 2024. 3

  13. [13]

    The cityscapes dataset for semantic urban scene understanding

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016. 5, 8, 10

  14. [14]

    Mta-clip: Language-guided semantic segmentation with mask-text alignment

    Anurag Das, Xinting Hu, Li Jiang, and Bernt Schiele. Mta-clip: Language-guided semantic segmentation with mask-text alignment. InEuropean Conference on Computer Vision, pages 39–56. Springer, 2024. 1, 3

  15. [15]

    Hyperbolic image-text representations

    Karan Desai, Maximilian Nickel, Tanmay Rajpurohit, Justin Johnson, and Shanmukha Ramakrishna Vedantam. Hyperbolic image-text representations. InInternational Conference on Machine Learning, pages 7694–7731. PMLR, 2023. 1, 2, 3, 4, 6, 9 13

  16. [16]

    Sharp minima can generalize for deep nets

    Laurent Dinh, Razvan Pascanu, Samy Bengio, and Yoshua Bengio. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pages 1019–1028. PMLR, 2017. 12

  17. [17]

    Adapting auxiliary losses using gradient similarity

    Yunshu Du, Wojciech M Czarnecki, Siddhant M Jayakumar, Mehrdad Farajtabar, Razvan Pascanu, and Balaji Lakshmi- narayanan. Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224, 2018. 7

  18. [18]

    The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010

    Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010. 5, 8, 10

  19. [19]

    Com- puting the gromov hyperbolicity of a discrete metric space

    Herv´e Fournier, Anas Ismail, and Antoine Vigneron. Com- puting the gromov hyperbolicity of a discrete metric space. Information Processing Letters, 115(6-8):576–579, 2015. 4

  20. [20]

    Hyperbolic active learning for semantic segmentation under domain shift.arXiv preprint arXiv:2306.11180, 2023

    Luca Franco, Paolo Mandica, Konstantinos Kallidromitis, Devin Guillory, Yu-Teng Li, Trevor Darrell, and Fabio Galasso. Hyperbolic active learning for semantic segmentation under domain shift.arXiv preprint arXiv:2306.11180, 2023. 3

  21. [21]

    Hyperbolic self-paced learning for self-supervised skeleton-based action representations.arXiv preprint arXiv:2303.06242, 2023

    Luca Franco, Paolo Mandica, Bharti Munjal, and Fabio Galasso. Hyperbolic self-paced learning for self-supervised skeleton-based action representations.arXiv preprint arXiv:2303.06242, 2023. 8

  22. [22]

    Hyperbolic entailment cones for learning hierarchical embed- dings

    Octavian Ganea, Gary B ´ecigneul, and Thomas Hofmann. Hyperbolic entailment cones for learning hierarchical embed- dings. InInternational conference on machine learning, pages 1646–1655. PMLR, 2018. 2, 3, 6

  23. [23]

    Hyperbolic neural networks.Advances in neural information processing systems, 31, 2018

    Octavian Ganea, Gary B ´ecigneul, and Thomas Hofmann. Hyperbolic neural networks.Advances in neural information processing systems, 31, 2018. 7

  24. [24]

    Hyperbolic groups

    Mikhael Gromov. Hyperbolic groups. InEssays in group theory, pages 75–263. Springer, 1987. 4

  25. [25]

    Lorentz entailment cone for semantic segmentation

    Zahid Hasan, Masud Ahmed, and Nirmalya Roy. Lorentz entailment cone for semantic segmentation. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5216–5225, 2026. 2, 3

  26. [26]

    Mask r-cnn

    Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask r-cnn. InProceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017. 2

  27. [27]

    Taxonomy-aware continual semantic segmentation in hyper- bolic spaces for open-world perception.IEEE Robotics and Automation Letters, 2024

    Julia Hindel, Daniele Cattaneo, and Abhinav Valada. Taxonomy-aware continual semantic segmentation in hyper- bolic spaces for open-world perception.IEEE Robotics and Automation Letters, 2024. 3

  28. [28]

    Intriguing properties of hyperbolic embeddings in vision-language models

    Sarah Ibrahimi, Mina Ghadimi Atigh, Nanne Van Noord, Pascal Mettes, and Marcel Worring. Intriguing properties of hyperbolic embeddings in vision-language models. Transactions on Machine Learning Research, 2024. 2

  29. [29]

    On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

    Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima.arXiv preprint arXiv:1609.04836, 2016. 12

  30. [30]

    Hyperbolic image embeddings

    Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, and Victor Lempitsky. Hyperbolic image embeddings. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6418–6428,

  31. [31]

    Probabilistic prompt learning for dense prediction

    Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang, and Kwanghoon Sohn. Probabilistic prompt learning for dense prediction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6768–6777, 2023. 3

  32. [32]

    Exploring simple open-vocabulary semantic segmentation

    Zihang Lai. Exploring simple open-vocabulary semantic segmentation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30221–30230, 2025. 3

  33. [33]

    Lorentzian distance learning for hyperbolic representations

    Marc Law, Renjie Liao, Jake Snell, and Richard Zemel. Lorentzian distance learning for hyperbolic representations. InInternational Conference on Machine Learning, pages 3672–3681. PMLR, 2019. 3

  34. [34]

    Inferring concept hierarchies from text corpora via hyperbolic embeddings.arXiv preprint arXiv:1902.00913, 2019

    Matt Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, and Maximilian Nickel. Inferring concept hierarchies from text corpora via hyperbolic embeddings.arXiv preprint arXiv:1902.00913, 2019. 3

  35. [35]

    Scalable multitask learning using gradient-based estimation of task affinity

    Dongyue Li, Aneesh Sharma, and Hongyang R Zhang. Scalable multitask learning using gradient-based estimation of task affinity. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1542–1553, 2024. 8

  36. [36]

    Visualizing the loss landscape of neural nets.Ad- vances in neural information processing systems, 31, 2018

    Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets.Ad- vances in neural information processing systems, 31, 2018. 12

  37. [37]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 8

  38. [38]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 8

  39. [39]

    Hyperbolic deep learning in computer vision: A survey.International Journal of Computer Vision, 132(9):3484–3508, 2024

    Pascal Mettes, Mina Ghadimi Atigh, Martin Keller-Ressel, Jeffrey Gu, and Serena Yeung. Hyperbolic deep learning in computer vision: A survey.International Journal of Computer Vision, 132(9):3484–3508, 2024. 3

  40. [40]

    Wordnet: a lexical database for english

    George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995. 3, 5

  41. [41]

    The numerical stability of hyperbolic representation learning

    Gal Mishne, Zhengchao Wan, Yusu Wang, and Sheng Yang. The numerical stability of hyperbolic representation learning. InInternational Conference on Machine Learning, pages 24925–24949. PMLR, 2023. 3

  42. [42]

    Hyperbolic u-net for robust medical image seg- mentation

    Swasti Shreya Mishra, Max van Spengler, Erwin Berkhout, and Pascal Mettes. Hyperbolic u-net for robust medical image seg- mentation. InMedical Imaging with Deep Learning, 2026. 3

  43. [43]

    Poincar´e embeddings for learning hierarchical representations.Advances in neural information processing systems, 30, 2017

    Maximillian Nickel and Douwe Kiela. Poincar´e embeddings for learning hierarchical representations.Advances in neural information processing systems, 30, 2017. 2, 3, 10

  44. [44]

    Learning continuous hierarchies in the lorentz model of hyperbolic geometry

    Maximillian Nickel and Douwe Kiela. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. InInternational conference on machine learning, pages 3779–3788. PMLR, 2018. 2, 3, 10, 11

  45. [45]

    Hyperbolic deep neural networks: A survey.IEEE Transactions on pattern analysis and machine intelligence, 44(12):10023–10044, 2021

    Wei Peng, Tuomas Varanka, Abdelrahman Mostafa, Henglin Shi, and Guoying Zhao. Hyperbolic deep neural networks: A survey.IEEE Transactions on pattern analysis and machine intelligence, 44(12):10023–10044, 2021. 3, 4

  46. [46]

    Understanding fine-tuning clip for open-vocabulary semantic segmentation in hyperbolic space

    Zelin Peng, Zhengqin Xu, Zhilin Zeng, Changsong Wen, Yu Huang, Menglin Yang, Feilong Tang, and Wei Shen. Understanding fine-tuning clip for open-vocabulary semantic segmentation in hyperbolic space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4562–4572, 2025. 3 14

  47. [47]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR,

  48. [48]

    Denseclip: Language-guided dense prediction with context-aware prompt- ing

    Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, and Jiwen Lu. Denseclip: Language-guided dense prediction with context-aware prompt- ing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18082–18091, 2022. 1, 3

  49. [49]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084, 2019. 5

  50. [50]

    U-net: Convolutional networks for biomedical image segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer,

  51. [51]

    Representation tradeoffs for hyperbolic embeddings

    Frederic Sala, Chris De Sa, Albert Gu, and Christopher R´e. Representation tradeoffs for hyperbolic embeddings. InInternational conference on machine learning, pages 4460–4469. PMLR, 2018. 3

  52. [52]

    Low distortion delaunay embedding of trees in hyperbolic plane

    Rik Sarkar. Low distortion delaunay embedding of trees in hyperbolic plane. InInternational symposium on graph drawing, pages 355–366. Springer, 2011. 3

  53. [53]

    Order -embeddings of images and language[J]

    Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. Order-embeddings of images and language.arXiv preprint arXiv:1511.06361, 2015. 2, 3, 11

  54. [54]

    Spot: Better frozen model adaptation through soft prompt transfer

    Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou, and Daniel Cer. Spot: Better frozen model adaptation through soft prompt transfer. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5039–5059, 2022. 7

  55. [55]

    Vison transformer adapter-based hyperbolic embed- dings for multi-lesion segmentation in diabetic retinopathy

    Zijian Wang, Haimei Lu, Haixin Yan, Hongxing Kan, and Li Jin. Vison transformer adapter-based hyperbolic embed- dings for multi-lesion segmentation in diabetic retinopathy. Scientific reports, 13(1):11178, 2023. 3

  56. [56]

    Flattening the parent bias: Hierarchical semantic segmentation in the poincar ´e ball

    Simon Weber, Bar Z ¨ong¨ur, Nikita Araslanov, and Daniel Cremers. Flattening the parent bias: Hierarchical semantic segmentation in the poincar ´e ball. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28223–28232, 2024. 2, 3, 7

  57. [57]

    Unsupervised discovery of the long-tail in instance segmentation using hierarchical self-supervision

    Zhenzhen Weng, Mehmet Giray Ogut, Shai Limonchik, and Serena Yeung. Unsupervised discovery of the long-tail in instance segmentation using hierarchical self-supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2603–2612, 2021. 3

  58. [58]

    Image-text co- decomposition for text-supervised semantic segmentation

    Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang, Chun-Pei Chen, Yu-Lun Liu, Min-Hung Chen, Hou-Ning Hu, Yung-Yu Chuang, and Yen-Yu Lin. Image-text co- decomposition for text-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26794–26803, 2024. 1, 3

  59. [59]

    Semantic projection network for zero-and few-label semantic segmentation

    Yongqin Xian, Subhabrata Choudhury, Yang He, Bernt Schiele, and Zeynep Akata. Semantic projection network for zero-and few-label semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8256–8265, 2019. 12

  60. [60]

    Segformer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems, 34:12077–12090,

    Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems, 34:12077–12090,

  61. [61]

    Hyperbolic fine-tuning for large language models.arXiv preprint arXiv:2410.04010, 2024

    Menglin Yang, Aosong Feng, Bo Xiong, Jihong Liu, Irwin King, and Rex Ying. Hyperbolic fine-tuning for large language models.arXiv preprint arXiv:2410.04010, 2024. 3

  62. [62]

    A simple framework for text-supervised semantic segmentation

    Muyang Yi, Quan Cui, Hao Wu, Cheng Yang, Osamu Yoshie, and Hongtao Lu. A simple framework for text-supervised semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7071–7080, 2023. 3

  63. [63]

    Gradient surgery for multi- task learning.Advances in neural information processing systems, 33:5824–5836, 2020

    Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi- task learning.Advances in neural information processing systems, 33:5824–5836, 2020. 7

  64. [64]

    Ifseg: Image-free semantic segmentation via vision-language model

    Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, and Jinwoo Shin. Ifseg: Image-free semantic segmentation via vision-language model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2967–2977, 2023. 1, 3

  65. [65]

    Open vocabulary scene parsing

    Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, and Anto- nio Torralba. Open vocabulary scene parsing. InProceedings of the IEEE International Conference on Computer Vision, pages 2002–2010, 2017. 3

  66. [66]

    Semantic understand- ing of scenes through the ade20k dataset.International Journal of Computer Vision, 127(3):302–321, 2019

    Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Semantic understand- ing of scenes through the ade20k dataset.International Journal of Computer Vision, 127(3):302–321, 2019. 5, 8, 10 15 Derivative Computations for Various Functions Dot Product Given: •x= [x 1,x2,...,xd] •y i = [yi1,yi2,...,yid] •f i =x·y ...