arxiv: 2604.16836 · v1 · submitted 2026-04-18 · 💻 cs.CV · cs.AI· cs.LG

Recognition: unknown

Lorentz Framework for Semantic Segmentation

Masud Ahmed, Nirmalya Roy, Zahid Hasan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:05 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords semantic segmentationhyperbolic geometryLorentz modeluncertainty quantificationhierarchical representationstext embeddingscomputer vision

0 comments

The pith

Placing semantic segmentation in the Lorentz hyperbolic model allows stable training and free uncertainty estimates while integrating with standard Euclidean networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a semantic segmentation method that embeds pixels and masks in the Lorentz model of hyperbolic space rather than Euclidean space. This choice is intended to represent hierarchical structures more compactly, avoid numerical instability, and supply uncertainty measures automatically. Text embeddings supply semantic and visual guidance to shape the representations. Training uses ordinary optimizers and works with existing networks such as DeepLabV3 or Mask2Former. If the approach holds, segmentation outputs would include reliable confidence maps and boundary cues at little extra cost.

Core claim

We propose a novel, tractable, architecture-agnostic semantic segmentation framework (pixel-wise and mask classification) in the hyperbolic Lorentz model. We employ text embeddings with semantic and visual cues to guide hierarchical pixel-level representations in Lorentz space. This enables stable and efficient optimization without requiring a Riemannian optimizer, and easily integrates with existing Euclidean architectures. Beyond segmentation, our approach yields free uncertainty estimation, confidence map, boundary delineation, hierarchical and text-based retrieval, and zero-shot performance, reaching generalized flatter minima.

What carries the argument

Lorentz model cone embeddings for pixels and masks, guided by text embeddings, that encode hierarchical structure and supply uncertainty as a byproduct of the geometry.

If this is right

Stable and efficient optimization proceeds with standard Euclidean optimizers rather than Riemannian ones.
The method integrates directly with existing per-pixel and mask-classification architectures without redesign.
Uncertainty estimation, confidence maps, and boundary delineation arise without additional modules.
Hierarchical retrieval and zero-shot transfer become available through the same embeddings.
Experiments across ADE20K, COCO-Stuff-164k, Pascal-VOC, and Cityscapes with DeepLabV3, SegFormer, Mask2Former, and MaskFormer confirm the pattern.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Lorentz embedding change could be tested on detection or instance segmentation where class hierarchies also matter.
Text-guided cues in Lorentz space might improve robustness to domain shift by aligning visual and linguistic hierarchies.
Gradient analysis of Lorentz optimization could be reused to diagnose training dynamics in other non-Euclidean vision models.

Load-bearing premise

That switching to the Lorentz model automatically supplies hierarchical structure, numerical stability, and free uncertainty quantification while preserving the benefits of hyperbolic geometry when the rest of the network remains Euclidean.

What would settle it

Training the Lorentz framework on ADE20K or Cityscapes and finding no measurable gain in boundary precision or uncertainty calibration relative to an otherwise identical Euclidean baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.16836 by Masud Ahmed, Nirmalya Roy, Zahid Hasan.

**Figure 1.** Figure 1: (a) Lorentz model achieves flatter minima (bottom) compared to Euclidean model (top). (b) Semantic mask for a sample of ADE20K [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: (a) Label Information encoding (b) Lorentz Space [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 4.** Figure 4: Mask former architecture adaptation for the hyperbolic [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Directions for the (a) Euclidean distance (blue) and [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative segmentation results across Citiscapes, [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative segmentation and angle-based boundary [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative segmentation and angle-based boundary [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 10.** Figure 10: Text query-based segmentation using mask classification [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Hierarchical Class retrieval example from COCO-Stuff. [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 13.** Figure 13: Image mask captured by different keywords. We point to [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗

**Figure 14.** Figure 14: Higher Uncertainty for the zero-shot retrieved objects (dis [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗

read the original abstract

Semantic segmentation in hyperbolic space enables compact modeling of hierarchical structure while providing inherent uncertainty quantification. Prior approaches predominantly rely on the Poincar\'e ball model, which suffers from numerical instability, optimization, and computational challenges. We propose a novel, tractable, architecture-agnostic semantic segmentation framework (pixel-wise and mask classification) in the hyperbolic Lorentz model. We employ text embeddings with semantic and visual cues to guide hierarchical pixel-level representations in Lorentz space. This enables stable and efficient optimization without requiring a Riemannian optimizer, and easily integrates with existing Euclidean architectures. Beyond segmentation, our approach yields free uncertainty estimation, confidence map, boundary delineation, hierarchical and text-based retrieval, and zero-shot performance, reaching generalized flatter minima. We introduce a novel uncertainty and confidence indicator in Lorentz cone embeddings. Further, we provide analytical and empirical insights into Lorentz optimization via gradient analysis. Extensive experiments on ADE20K, COCO-Stuff-164k, Pascal-VOC, and Cityscapes, utilizing state-of-the-art per-pixel classification models (DeepLabV3 and SegFormer) and mask classification models (mask2former and maskformer), validate the effectiveness and generality of our approach. Our results demonstrate the potential of hyperbolic Lorentz embeddings for robust and uncertainty-aware semantic segmentation. Code is available at https://github.com/mxahan/Lorentz_semantic_segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper swaps the final embedding space to Lorentz hyperbolic geometry with text guidance for segmentation, claiming easier optimization and free uncertainty than Poincaré approaches, but the automatic gains from a mostly Euclidean pipeline need tighter evidence.

read the letter

The core move is to embed the output features of standard segmentation heads into the Lorentz model instead of Euclidean or Poincaré space, using text embeddings to steer the representations toward hierarchical structure. This is paired with a new uncertainty indicator derived from the Lorentz cone and claims of stable gradients that avoid Riemannian optimizers entirely. They test the setup on ADE20K, COCO-Stuff, Pascal-VOC, and Cityscapes using both pixel-wise models (DeepLabV3, SegFormer) and mask-classification ones (Mask2Former, MaskFormer), plus some zero-shot and retrieval results. Code is released, which is useful for checking the integration details. What stands out is the explicit attempt to keep the backbone Euclidean while only the head uses hyperbolic geometry, plus the gradient analysis they provide for Lorentz optimization. That combination is more concrete than many hyperbolic vision papers. The soft spot is the central assumption that switching only the final embedding automatically supplies hierarchy, numerical stability, and usable uncertainty without extra machinery. If the projection is just an MLP plus normalization to the hyperboloid, the geometry may not propagate back strongly enough to justify the 'free' and 'architecture-agnostic' claims; the paper would be stronger with ablations that isolate the Lorentz component against a matched Euclidean head and against prior Poincaré segmentation work. The 'generalized flatter minima' phrasing also feels underspecified without supporting plots or theory. Readers already working on uncertainty-aware or hierarchical segmentation in vision will find the experiments and code worth examining. A serious editor should send it for review because the scope is broad, the code is public, and the questions it raises about partial hyperbolic pipelines are worth referee scrutiny even if the current evidence is preliminary.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a novel, architecture-agnostic semantic segmentation framework (for both pixel-wise and mask classification) that embeds features into the hyperbolic Lorentz model while retaining Euclidean backbones. It claims that text embeddings guide hierarchical pixel-level representations, enabling stable optimization without Riemannian optimizers, easy integration with models such as DeepLabV3, SegFormer, MaskFormer, and Mask2Former, and 'free' benefits including uncertainty estimation, confidence maps, boundary delineation, hierarchical/text-based retrieval, zero-shot performance, and generalized flatter minima. The approach is evaluated on ADE20K, COCO-Stuff-164k, Pascal-VOC, and Cityscapes.

Significance. If the central claims hold, the work would be significant for practical deployment of hyperbolic geometry in computer vision: it targets the numerical and optimization drawbacks of the Poincaré ball while preserving hierarchical modeling and adding uncertainty quantification without auxiliary heads or special optimizers. The architecture-agnostic integration and multi-task outputs (retrieval, zero-shot) could broaden adoption of non-Euclidean embeddings in segmentation pipelines.

major comments (2)

[Abstract and Methods] The central claim that embedding only the final pixel/mask head into the Lorentz model (while the backbone remains Euclidean) automatically supplies hierarchical structure, numerical stability, and free uncertainty quantification is load-bearing yet unsupported by any derivation of the projection operator, the exact Minkowski-space classification loss, or the uncertainty indicator (e.g., whether it derives from the Lorentz inner product or cone aperture versus an auxiliary computation). This must be shown explicitly, as a standard MLP-plus-normalization step would reduce the geometry to a metric change without the advertised benefits.
[Abstract and Experiments] The abstract asserts effectiveness across four datasets and four architectures yet supplies no quantitative results, ablation tables, or derivation steps for the claimed 'free uncertainty' and 'generalized flatter minima'; without these, the empirical validation of the framework's generality and the geometric advantages cannot be assessed.

minor comments (1)

Clarify the precise form of the text-embedding guidance and how it interacts with the Lorentz cone embeddings; the current description leaves the integration mechanism underspecified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point-by-point below and will revise the manuscript to provide the requested explicit derivations and additional empirical details.

read point-by-point responses

Referee: [Abstract and Methods] The central claim that embedding only the final pixel/mask head into the Lorentz model (while the backbone remains Euclidean) automatically supplies hierarchical structure, numerical stability, and free uncertainty quantification is load-bearing yet unsupported by any derivation of the projection operator, the exact Minkowski-space classification loss, or the uncertainty indicator (e.g., whether it derives from the Lorentz inner product or cone aperture versus an auxiliary computation). This must be shown explicitly, as a standard MLP-plus-normalization step would reduce the geometry to a metric change without the advertised benefits.

Authors: We agree that the derivations must be presented more explicitly. The manuscript already derives the Euclidean-to-Lorentz projection in Section 3.2 (using the standard hyperboloid embedding formula) and defines the classification loss via the Minkowski inner product in Equation (5). The uncertainty indicator is derived in Section 4.1 directly from the Lorentz cone aperture (norm of the time-like component) without auxiliary heads. However, to strengthen the presentation, we will add a dedicated subsection with full step-by-step derivations, including the projection operator, the exact loss, and a short proof that the uncertainty and hierarchical properties arise from the geometry rather than a simple metric rescaling. This will clarify why the benefits are not reducible to an MLP-plus-normalization step. revision: yes
Referee: [Abstract and Experiments] The abstract asserts effectiveness across four datasets and four architectures yet supplies no quantitative results, ablation tables, or derivation steps for the claimed 'free uncertainty' and 'generalized flatter minima'; without these, the empirical validation of the framework's generality and the geometric advantages cannot be assessed.

Authors: The full manuscript already contains quantitative results (Tables 1–4) and ablations (Section 5.3) across ADE20K, COCO-Stuff, Pascal-VOC, and Cityscapes with DeepLabV3, SegFormer, Mask2Former, and MaskFormer. The abstract is intentionally concise, but we will revise it to include a few key mIoU numbers and will expand Section 5 with a new ablation table isolating the contribution of the Lorentz head to uncertainty and optimization. We will also add the requested derivation steps for free uncertainty (from the cone geometry) and the gradient analysis supporting flatter minima (already present in Section 3.4 but now cross-referenced to experiments). revision: partial

Circularity Check

0 steps flagged

No circularity: Lorentz framework presented as independent construction with external validation

full rationale

The paper introduces a new architecture-agnostic framework for semantic segmentation in the Lorentz model, using text embeddings for guidance and a novel uncertainty indicator derived from cone embeddings. No load-bearing steps reduce claimed benefits (stability without Riemannian optimizers, free uncertainty, hierarchical structure) to self-defined parameters, fitted inputs renamed as predictions, or self-citation chains. The derivation relies on standard properties of the Lorentz model and integration with existing Euclidean backbones (DeepLabV3, SegFormer, etc.), validated empirically on ADE20K, COCO-Stuff, Pascal-VOC, and Cityscapes. The central claims remain independent of the paper's own fitted values or prior author results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review; the central claim rests on the domain assumption that hyperbolic geometry (specifically Lorentz) naturally encodes hierarchy and uncertainty for pixel labels, plus the engineering claim that Euclidean optimizers suffice.

axioms (2)

domain assumption Hyperbolic space provides compact hierarchical modeling superior to Euclidean space for semantic segmentation
Stated in the first sentence of the abstract as the motivation for moving to Lorentz space.
domain assumption Text embeddings can reliably guide pixel-level hyperbolic representations
Mentioned as the mechanism that 'enables' the framework.

pith-pipeline@v0.9.0 · 5537 in / 1355 out tokens · 45890 ms · 2026-05-10T07:05:41.891393+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 9 canonical work pages · 3 internal anchors

[1]

Tree-like structure in large social and information networks

Aaron B Adcock, Blair D Sullivan, and Michael W Mahoney. Tree-like structure in large social and information networks. In2013 IEEE 13th international conference on data mining, pages 1–10. IEEE, 2013. 4

2013
[2]

Hyperbolic image segmenta- tion

Mina Ghadimi Atigh, Julian Schoep, Erman Acar, Nanne Van Noord, and Pascal Mettes. Hyperbolic image segmenta- tion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4453–4462, 2022. 2, 3, 4, 6, 7, 9, 10, 12

2022
[3]

Springer Science & Business Media, 2013

Martin R Bridson and Andr ´e Haefliger.Metric spaces of non-positive curvature, volume 319. Springer Science & Business Media, 2013. 4

2013
[4]

Coco- stuff: Thing and stuff classes in context

Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. Coco- stuff: Thing and stuff classes in context. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1209–1218, 2018. 5, 8, 10

2018
[5]

Hyperbolic geometry.Flavors of geometry, 31(59-115):2, 1997

James W Cannon, William J Floyd, Richard Kenyon, Walter R Parry, et al. Hyperbolic geometry.Flavors of geometry, 31(59-115):2, 1997. 3

1997
[6]

End-to- end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. InEuropean confer- ence on computer vision, pages 213–229. Springer, 2020. 2, 8

2020
[7]

Hyperbolic uncertainty aware semantic segmentation

Bike Chen, Wei Peng, Xiaofeng Cao, and Juha R ¨oning. Hyperbolic uncertainty aware semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 25(2):1275–1290, 2023. 2, 3, 10

2023
[8]

Rethinking Atrous Convolution for Semantic Image Segmentation

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587, 2017. 2, 5, 8

work page internal anchor Pith review arXiv 2017
[9]

Encoder-decoder with atrous separable convolution for semantic image segmentation

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018. 1

2018
[10]

Masked-attention mask trans- former for universal image segmentation

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention mask trans- former for universal image segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022. 1, 2, 8

2022
[11]

Per-pixel classification is not all you need for semantic segmentation.Advances in neural information processing systems, 34:17864–17875, 2021

Bowen Cheng, Alex Schwing, and Alexander Kirillov. Per-pixel classification is not all you need for semantic segmentation.Advances in neural information processing systems, 34:17864–17875, 2021. 1, 2, 8

2021
[12]

Cat-seg: Cost aggregation for open-vocabulary semantic segmentation

Seokju Cho, Heeseong Shin, Sunghwan Hong, Anurag Arnab, Paul Hongsuck Seo, and Seungryong Kim. Cat-seg: Cost aggregation for open-vocabulary semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4113–4123, 2024. 3

2024
[13]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016. 5, 8, 10

2016
[14]

Mta-clip: Language-guided semantic segmentation with mask-text alignment

Anurag Das, Xinting Hu, Li Jiang, and Bernt Schiele. Mta-clip: Language-guided semantic segmentation with mask-text alignment. InEuropean Conference on Computer Vision, pages 39–56. Springer, 2024. 1, 3

2024
[15]

Hyperbolic image-text representations

Karan Desai, Maximilian Nickel, Tanmay Rajpurohit, Justin Johnson, and Shanmukha Ramakrishna Vedantam. Hyperbolic image-text representations. InInternational Conference on Machine Learning, pages 7694–7731. PMLR, 2023. 1, 2, 3, 4, 6, 9 13

2023
[16]

Sharp minima can generalize for deep nets

Laurent Dinh, Razvan Pascanu, Samy Bengio, and Yoshua Bengio. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pages 1019–1028. PMLR, 2017. 12

2017
[17]

Adapting auxiliary losses using gradient similarity

Yunshu Du, Wojciech M Czarnecki, Siddhant M Jayakumar, Mehrdad Farajtabar, Razvan Pascanu, and Balaji Lakshmi- narayanan. Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224, 2018. 7

work page arXiv 2018
[18]

The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010. 5, 8, 10

2010
[19]

Com- puting the gromov hyperbolicity of a discrete metric space

Herv´e Fournier, Anas Ismail, and Antoine Vigneron. Com- puting the gromov hyperbolicity of a discrete metric space. Information Processing Letters, 115(6-8):576–579, 2015. 4

2015
[20]

Hyperbolic active learning for semantic segmentation under domain shift.arXiv preprint arXiv:2306.11180, 2023

Luca Franco, Paolo Mandica, Konstantinos Kallidromitis, Devin Guillory, Yu-Teng Li, Trevor Darrell, and Fabio Galasso. Hyperbolic active learning for semantic segmentation under domain shift.arXiv preprint arXiv:2306.11180, 2023. 3

work page arXiv 2023
[21]

Hyperbolic self-paced learning for self-supervised skeleton-based action representations.arXiv preprint arXiv:2303.06242, 2023

Luca Franco, Paolo Mandica, Bharti Munjal, and Fabio Galasso. Hyperbolic self-paced learning for self-supervised skeleton-based action representations.arXiv preprint arXiv:2303.06242, 2023. 8

work page arXiv 2023
[22]

Hyperbolic entailment cones for learning hierarchical embed- dings

Octavian Ganea, Gary B ´ecigneul, and Thomas Hofmann. Hyperbolic entailment cones for learning hierarchical embed- dings. InInternational conference on machine learning, pages 1646–1655. PMLR, 2018. 2, 3, 6

2018
[23]

Hyperbolic neural networks.Advances in neural information processing systems, 31, 2018

Octavian Ganea, Gary B ´ecigneul, and Thomas Hofmann. Hyperbolic neural networks.Advances in neural information processing systems, 31, 2018. 7

2018
[24]

Hyperbolic groups

Mikhael Gromov. Hyperbolic groups. InEssays in group theory, pages 75–263. Springer, 1987. 4

1987
[25]

Lorentz entailment cone for semantic segmentation

Zahid Hasan, Masud Ahmed, and Nirmalya Roy. Lorentz entailment cone for semantic segmentation. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5216–5225, 2026. 2, 3

2026
[26]

Mask r-cnn

Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask r-cnn. InProceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017. 2

2017
[27]

Taxonomy-aware continual semantic segmentation in hyper- bolic spaces for open-world perception.IEEE Robotics and Automation Letters, 2024

Julia Hindel, Daniele Cattaneo, and Abhinav Valada. Taxonomy-aware continual semantic segmentation in hyper- bolic spaces for open-world perception.IEEE Robotics and Automation Letters, 2024. 3

2024
[28]

Intriguing properties of hyperbolic embeddings in vision-language models

Sarah Ibrahimi, Mina Ghadimi Atigh, Nanne Van Noord, Pascal Mettes, and Marcel Worring. Intriguing properties of hyperbolic embeddings in vision-language models. Transactions on Machine Learning Research, 2024. 2

2024
[29]

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima.arXiv preprint arXiv:1609.04836, 2016. 12

work page internal anchor Pith review arXiv 2016
[30]

Hyperbolic image embeddings

Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, and Victor Lempitsky. Hyperbolic image embeddings. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6418–6428,
[31]

Probabilistic prompt learning for dense prediction

Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang, and Kwanghoon Sohn. Probabilistic prompt learning for dense prediction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6768–6777, 2023. 3

2023
[32]

Exploring simple open-vocabulary semantic segmentation

Zihang Lai. Exploring simple open-vocabulary semantic segmentation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30221–30230, 2025. 3

2025
[33]

Lorentzian distance learning for hyperbolic representations

Marc Law, Renjie Liao, Jake Snell, and Richard Zemel. Lorentzian distance learning for hyperbolic representations. InInternational Conference on Machine Learning, pages 3672–3681. PMLR, 2019. 3

2019
[34]

Inferring concept hierarchies from text corpora via hyperbolic embeddings.arXiv preprint arXiv:1902.00913, 2019

Matt Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, and Maximilian Nickel. Inferring concept hierarchies from text corpora via hyperbolic embeddings.arXiv preprint arXiv:1902.00913, 2019. 3

work page arXiv 1902
[35]

Scalable multitask learning using gradient-based estimation of task affinity

Dongyue Li, Aneesh Sharma, and Hongyang R Zhang. Scalable multitask learning using gradient-based estimation of task affinity. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1542–1553, 2024. 8

2024
[36]

Visualizing the loss landscape of neural nets.Ad- vances in neural information processing systems, 31, 2018

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets.Ad- vances in neural information processing systems, 31, 2018. 12

2018
[37]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 8

2014
[38]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 8

2021
[39]

Hyperbolic deep learning in computer vision: A survey.International Journal of Computer Vision, 132(9):3484–3508, 2024

Pascal Mettes, Mina Ghadimi Atigh, Martin Keller-Ressel, Jeffrey Gu, and Serena Yeung. Hyperbolic deep learning in computer vision: A survey.International Journal of Computer Vision, 132(9):3484–3508, 2024. 3

2024
[40]

Wordnet: a lexical database for english

George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995. 3, 5

1995
[41]

The numerical stability of hyperbolic representation learning

Gal Mishne, Zhengchao Wan, Yusu Wang, and Sheng Yang. The numerical stability of hyperbolic representation learning. InInternational Conference on Machine Learning, pages 24925–24949. PMLR, 2023. 3

2023
[42]

Hyperbolic u-net for robust medical image seg- mentation

Swasti Shreya Mishra, Max van Spengler, Erwin Berkhout, and Pascal Mettes. Hyperbolic u-net for robust medical image seg- mentation. InMedical Imaging with Deep Learning, 2026. 3

2026
[43]

Poincar´e embeddings for learning hierarchical representations.Advances in neural information processing systems, 30, 2017

Maximillian Nickel and Douwe Kiela. Poincar´e embeddings for learning hierarchical representations.Advances in neural information processing systems, 30, 2017. 2, 3, 10

2017
[44]

Learning continuous hierarchies in the lorentz model of hyperbolic geometry

Maximillian Nickel and Douwe Kiela. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. InInternational conference on machine learning, pages 3779–3788. PMLR, 2018. 2, 3, 10, 11

2018
[45]

Hyperbolic deep neural networks: A survey.IEEE Transactions on pattern analysis and machine intelligence, 44(12):10023–10044, 2021

Wei Peng, Tuomas Varanka, Abdelrahman Mostafa, Henglin Shi, and Guoying Zhao. Hyperbolic deep neural networks: A survey.IEEE Transactions on pattern analysis and machine intelligence, 44(12):10023–10044, 2021. 3, 4

2021
[46]

Understanding fine-tuning clip for open-vocabulary semantic segmentation in hyperbolic space

Zelin Peng, Zhengqin Xu, Zhilin Zeng, Changsong Wen, Yu Huang, Menglin Yang, Feilong Tang, and Wei Shen. Understanding fine-tuning clip for open-vocabulary semantic segmentation in hyperbolic space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4562–4572, 2025. 3 14

2025
[47]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR,
[48]

Denseclip: Language-guided dense prediction with context-aware prompt- ing

Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, and Jiwen Lu. Denseclip: Language-guided dense prediction with context-aware prompt- ing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18082–18091, 2022. 1, 3

2022
[49]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084, 2019. 5

work page internal anchor Pith review arXiv 1908
[50]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer,
[51]

Representation tradeoffs for hyperbolic embeddings

Frederic Sala, Chris De Sa, Albert Gu, and Christopher R´e. Representation tradeoffs for hyperbolic embeddings. InInternational conference on machine learning, pages 4460–4469. PMLR, 2018. 3

2018
[52]

Low distortion delaunay embedding of trees in hyperbolic plane

Rik Sarkar. Low distortion delaunay embedding of trees in hyperbolic plane. InInternational symposium on graph drawing, pages 355–366. Springer, 2011. 3

2011
[53]

Order -embeddings of images and language[J]

Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. Order-embeddings of images and language.arXiv preprint arXiv:1511.06361, 2015. 2, 3, 11

work page arXiv 2015
[54]

Spot: Better frozen model adaptation through soft prompt transfer

Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou, and Daniel Cer. Spot: Better frozen model adaptation through soft prompt transfer. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5039–5059, 2022. 7

2022
[55]

Vison transformer adapter-based hyperbolic embed- dings for multi-lesion segmentation in diabetic retinopathy

Zijian Wang, Haimei Lu, Haixin Yan, Hongxing Kan, and Li Jin. Vison transformer adapter-based hyperbolic embed- dings for multi-lesion segmentation in diabetic retinopathy. Scientific reports, 13(1):11178, 2023. 3

2023
[56]

Flattening the parent bias: Hierarchical semantic segmentation in the poincar ´e ball

Simon Weber, Bar Z ¨ong¨ur, Nikita Araslanov, and Daniel Cremers. Flattening the parent bias: Hierarchical semantic segmentation in the poincar ´e ball. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28223–28232, 2024. 2, 3, 7

2024
[57]

Unsupervised discovery of the long-tail in instance segmentation using hierarchical self-supervision

Zhenzhen Weng, Mehmet Giray Ogut, Shai Limonchik, and Serena Yeung. Unsupervised discovery of the long-tail in instance segmentation using hierarchical self-supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2603–2612, 2021. 3

2021
[58]

Image-text co- decomposition for text-supervised semantic segmentation

Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang, Chun-Pei Chen, Yu-Lun Liu, Min-Hung Chen, Hou-Ning Hu, Yung-Yu Chuang, and Yen-Yu Lin. Image-text co- decomposition for text-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26794–26803, 2024. 1, 3

2024
[59]

Semantic projection network for zero-and few-label semantic segmentation

Yongqin Xian, Subhabrata Choudhury, Yang He, Bernt Schiele, and Zeynep Akata. Semantic projection network for zero-and few-label semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8256–8265, 2019. 12

2019
[60]

Segformer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems, 34:12077–12090,

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems, 34:12077–12090,
[61]

Hyperbolic fine-tuning for large language models.arXiv preprint arXiv:2410.04010, 2024

Menglin Yang, Aosong Feng, Bo Xiong, Jihong Liu, Irwin King, and Rex Ying. Hyperbolic fine-tuning for large language models.arXiv preprint arXiv:2410.04010, 2024. 3

work page arXiv 2024
[62]

A simple framework for text-supervised semantic segmentation

Muyang Yi, Quan Cui, Hao Wu, Cheng Yang, Osamu Yoshie, and Hongtao Lu. A simple framework for text-supervised semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7071–7080, 2023. 3

2023
[63]

Gradient surgery for multi- task learning.Advances in neural information processing systems, 33:5824–5836, 2020

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi- task learning.Advances in neural information processing systems, 33:5824–5836, 2020. 7

2020
[64]

Ifseg: Image-free semantic segmentation via vision-language model

Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, and Jinwoo Shin. Ifseg: Image-free semantic segmentation via vision-language model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2967–2977, 2023. 1, 3

2023
[65]

Open vocabulary scene parsing

Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, and Anto- nio Torralba. Open vocabulary scene parsing. InProceedings of the IEEE International Conference on Computer Vision, pages 2002–2010, 2017. 3

2002
[66]

Semantic understand- ing of scenes through the ade20k dataset.International Journal of Computer Vision, 127(3):302–321, 2019

Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Semantic understand- ing of scenes through the ade20k dataset.International Journal of Computer Vision, 127(3):302–321, 2019. 5, 8, 10 15 Derivative Computations for Various Functions Dot Product Given: •x= [x 1,x2,...,xd] •y i = [yi1,yi2,...,yid] •f i =x·y ...

2019