pith. machine review for the scientific record. sign in

arxiv: 2604.04158 · v1 · submitted 2026-04-05 · 💻 cs.CV

Recognition: 3 theorem links

· Lean Theorem

Hierarchical Co-Embedding of Font Shapes and Impression Tags

Authors on Pith no claims yet

Pith reviewed 2026-05-13 16:50 UTC · model grok-4.3

classification 💻 cs.CV
keywords hyperbolic embeddingfont retrievalimpression tagsstyle specificityentailmentco-embeddingMyFonts
0
0 comments X

The pith

Hyperbolic co-embedding places fonts and impressions in a shared space so that entailment relations produce a radial measure of style specificity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that font shapes and impression tags correspond in a graded way, with some tags compatible with many styles and others tightly constraining the plausible fonts. It models this graded strength, called style specificity, by embedding both fonts and tags in hyperbolic space and enforcing two entailment rules: each impression entails compatible fonts, and less specific impressions entail more specific ones. The geometry that results automatically arranges low-specificity tags near the origin and high-specificity tags farther out, giving distance from the center an immediate interpretation as constraint strength. Experiments on the MyFonts dataset show this structure improves retrieval in both directions compared with ordinary paired embeddings.

Core claim

By jointly embedding font images and impression tags or tag sets in hyperbolic space under impression-to-font entailment and low-to-high style-specificity entailment, the learned space acquires a radial organization in which distance from the origin directly quantifies how narrowly an impression constrains font style, and this organization produces higher bidirectional retrieval accuracy than one-to-one alignment baselines.

What carries the argument

Hyperbolic co-embedding with impression-to-font entailment and low-to-high style-specificity entailment, which together induce the radial ordering of specificity.

If this is right

  • Bidirectional retrieval of fonts from impressions and impressions from fonts improves over one-to-one baselines.
  • Distance from the origin supplies a direct, data-driven scalar for how constraining any given impression is.
  • Traversal along radial lines produces a coherent ordering from broad to narrow impressions.
  • Tag-level inspection reveals which impressions are treated as general versus style-specific in the learned geometry.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same entailment-plus-hyperbolic pattern could be applied to other graded visual-textual domains such as clothing descriptions and garment images.
  • The radial specificity score might be used directly in recommendation interfaces to surface fonts that match narrow versus open-ended user requests.
  • A controlled ablation that keeps the entailment rules but switches to Euclidean space would test whether the hyperbolic geometry itself is required.

Load-bearing premise

The chosen entailment constraints correctly reflect the underlying hierarchical relationships between impressions and fonts rather than simply fitting the training data after the fact.

What would settle it

If a Euclidean embedding or a standard paired alignment model without the entailment constraints matches or exceeds the reported retrieval scores on the same MyFonts test set, the claimed advantage of the hyperbolic entailment structure would not hold.

Figures

Figures reproduced from arXiv: 2604.04158 by Kaito Shiku, Seiichi Uchida, Yugo Kubota.

Figure 1
Figure 1. Figure 1: Fonts and their associated impression tags in the MyFonts dataset [1]. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Hyperbolic co-embedding with entailment cones. (a) Feature space where [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cone aperture and style specificity. Cones widen near the origin for low [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview. We embed a font Fn, its impression tag set Sn, and a subset S˜ n in a shared hyperbolic space. Entailment cones place the lower style-specificity embedding of S˜ n nearer the origin and the higher style-specificity embedding of Sn farther away, enforcing ˜in → in. The cones also impose impression-to-font entailment in→fn through aper(·) and ext(·, ·). To encode these entailment-based hierarchical… view at source ↗
Figure 5
Figure 5. Figure 5: Histograms of distances from the origin o for fonts, impression-tag sets, and impression-tag subsets on the test split. Our method shows a clear radial ordering, whereas Impression-CLIP+ exhibits substantial overlap. These results show that Impression-CLIP+ yields substantial overlap among the three distance distributions, indicating limited hierarchical separation. In contrast, our method exhibits a clear… view at source ↗
Figure 6
Figure 6. Figure 6: Traversal from the origin o to a target font embedding. Our method shows a coherent shift from abstract to more style-specific impressions while remaining semantically consistent with the target font style. and “casual” appear consistently along the path from near o to the font. A similar trend is observed in the right example, where tags such as “serif” are retrieved together with increasingly specific im… view at source ↗
Figure 7
Figure 7. Figure 7: Style specificity of impression tags. The top-left plot shows the distribu [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
read the original abstract

Font shapes can evoke a wide range of impressions, but the correspondence between fonts and impression descriptions is not one-to-one: some impressions are broadly compatible with diverse styles, whereas others strongly constrain the set of plausible fonts. We refer to this graded constraint strength as style specificity. In this paper, we propose a hyperbolic co-embedding framework that models font--impression correspondence through entailment rather than simple paired alignment. Font images and impression descriptions, represented as single tags or tag sets, are embedded in a shared hyperbolic space with two complementary entailment constraints: impression-to-font entailment and low-to-high style-specificity entailment among impressions. This formulation induces a radial structure in which low style-specificity impressions lie near the origin and high style-specificity impressions lie farther away, yielding an interpretable geometric measure of how strongly an impression constrains font style. Experiments on the MyFonts dataset demonstrate improved bidirectional retrieval over strong one-to-one baselines. In addition, traversal and tag-level analyses show that the learned space captures a coherent progression from ambiguous to more style-specific impressions and provides a meaningful, data-driven quantification of style specificity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a hyperbolic co-embedding model for font images and impression tags (single or sets) that incorporates two entailment losses—impression-to-font and low-to-high style-specificity—to induce a radial hierarchy in hyperbolic space. Low-specificity impressions cluster near the origin while high-specificity ones lie farther out, providing an interpretable geometric measure of style specificity. On the MyFonts dataset the approach reports improved bidirectional retrieval mAP over one-to-one baselines, with additional qualitative support from traversals and tag analyses.

Significance. If the central claims hold after addressing the gaps below, the work supplies a geometrically grounded, hierarchical representation for font–impression correspondence that could aid design retrieval and generation tools. The explicit modeling of graded constraint strength via radial distance is a distinctive contribution relative to standard contrastive alignment, and the hyperbolic geometry choice aligns naturally with the entailment inductive bias.

major comments (3)
  1. [§4.2] §4.2 (Entailment losses): No ablation is reported that removes the impression-to-font and low-to-high specificity entailment terms while retaining the same hyperbolic co-embedding architecture and contrastive backbone. Without this control it remains possible that any hyperbolic embedding trained with paired alignment alone would produce comparable mAP and a similar radial gradient simply because the origin is fixed.
  2. [Table 1] Table 1 (Bidirectional retrieval): Results are presented as single mAP numbers without error bars, standard deviations across runs, or statistical significance tests. Given that the headline gains are modest, the absence of these statistics prevents assessment of whether the improvements are reliable.
  3. [§5.3] §5.3 (Style-specificity measure): The radial distance is defined directly from the learned embedding and used both to train the model and to quantify specificity. No external validation against human-rated specificity scores or an independent dataset is provided, leaving open the possibility that the measure is an artifact of the training objective rather than a predictive property of the data.
minor comments (2)
  1. [Abstract] The abstract states that 'traversal and tag-level analyses show coherent progression' but does not report any quantitative metric (e.g., ordering correlation or human preference scores) for these analyses.
  2. [Methods] Notation for the two entailment losses (impression-to-font and specificity hierarchy) is introduced without an explicit equation reference in the main text; adding a numbered equation would improve traceability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (Entailment losses): No ablation is reported that removes the impression-to-font and low-to-high specificity entailment terms while retaining the same hyperbolic co-embedding architecture and contrastive backbone. Without this control it remains possible that any hyperbolic embedding trained with paired alignment alone would produce comparable mAP and a similar radial gradient simply because the origin is fixed.

    Authors: We agree that an ablation isolating the entailment losses is needed. In the revised manuscript we will add results for the hyperbolic co-embedding model trained with only the contrastive loss (no entailment terms) and compare both retrieval mAP and the resulting radial gradient against the full model. This will demonstrate that the entailment constraints are responsible for the hierarchy and performance gains. revision: yes

  2. Referee: [Table 1] Table 1 (Bidirectional retrieval): Results are presented as single mAP numbers without error bars, standard deviations across runs, or statistical significance tests. Given that the headline gains are modest, the absence of these statistics prevents assessment of whether the improvements are reliable.

    Authors: We will rerun all experiments with multiple random seeds (at least five) and report mean mAP values together with standard deviations in the revised Table 1. We will also include paired statistical significance tests against the baselines to establish reliability of the reported gains. revision: yes

  3. Referee: [§5.3] §5.3 (Style-specificity measure): The radial distance is defined directly from the learned embedding and used both to train the model and to quantify specificity. No external validation against human-rated specificity scores or an independent dataset is provided, leaving open the possibility that the measure is an artifact of the training objective rather than a predictive property of the data.

    Authors: The radial distance is learned end-to-end from the entailment objective precisely to induce an interpretable geometric measure of specificity. While we lack external human-rated specificity scores, the qualitative traversal and tag analyses in §5.3 already show coherent progression from low- to high-specificity impressions. We will expand the discussion to explicitly note this limitation and will investigate possible correlations with any available external impression annotations. revision: partial

Circularity Check

1 steps flagged

Radial style-specificity measure is defined by construction from the low-to-high entailment loss

specific steps
  1. self definitional [Abstract]
    "low-to-high style-specificity entailment among impressions. This formulation induces a radial structure in which low style-specificity impressions lie near the origin and high style-specificity impressions lie farther away, yielding an interpretable geometric measure of how strongly an impression constrains font style."

    The entailment loss is constructed using style-specificity ordering to enforce the radial hierarchy; the radial distance is then declared to be the measure of style specificity. The geometric interpretation is therefore equivalent to the imposed constraint by design rather than discovered independently.

full rationale

The paper imposes a low-to-high style-specificity entailment constraint that explicitly forces low-specificity impressions near the origin and high-specificity ones farther out in hyperbolic space. The resulting radial distance is then presented as an interpretable geometric measure of specificity. This reduces the claimed quantification to a direct consequence of the training objective rather than an independent finding. Retrieval gains may retain independent value, but the interpretability claim does not. No external human-rated specificity benchmark is invoked to validate the radial ordering.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard hyperbolic geometry properties plus two modeling choices: that impression-to-font correspondence can be captured by entailment cones and that style specificity orders impressions radially. No new physical entities are introduced.

free parameters (2)
  • hyperbolic curvature
    Chosen to control the space's negative curvature; value not stated in abstract.
  • embedding dimension
    Standard hyperparameter that affects cone geometry and retrieval performance.
axioms (2)
  • standard math Hyperbolic space admits cone-based entailment relations that can be optimized via loss functions.
    Invoked when defining the two entailment constraints.
  • domain assumption Impression tags can be ordered by the breadth of compatible fonts they describe.
    Core premise that justifies the low-to-high specificity entailment.

pith-pipeline@v0.9.0 · 5495 in / 1373 out tokens · 21768 ms · 2026-05-13T16:50:54.321158+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

    Chen, T., Wang, Z., Xu, N., Jin, H., Luo, J.: Large-scale tag-based font retrieval with generative feature learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 9116–9125 (2019)

  2. [2]

    Multimedia Tools and Applications78(11), 14155–14172 (2019)

    Choi,S.,Aizawa,K.:Emotype:Expressingemotionsbychangingtypefaceinmobile messenger texting. Multimedia Tools and Applications78(11), 14155–14172 (2019)

  3. [3]

    In: Proceedings of the International Conference on Multimedia Retrieval (ICMR)

    Choi, S., Matsumura, S., Aizawa, K.: Assist users’ interactions in font search with unexpected but useful concepts generated by multimodal learning. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR). pp. 235–243 (2019)

  4. [4]

    In: Proceedings of the 1st International Workshop on Multimedia Alternate Realities

    Choi, S., Yamasaki, T., Aizawa, K.: Typeface emotion analysis for communica- tion on mobile messengers. In: Proceedings of the 1st International Workshop on Multimedia Alternate Realities. pp. 37–40 (2016)

  5. [5]

    IEICE Transactions on Information and SystemsE107-D(3), 354–362 (2024)

    Chujo, R., Suzuki, A., Hautasaari, A.: Exploring the effects of japanese font designs on impression formation and decision-making in text-based communication. IEICE Transactions on Information and SystemsE107-D(3), 354–362 (2024)

  6. [6]

    Journal of Applied Psychology17(6), 742–764 (1933)

    Davis, R.C., Smith, H.J.: Determinants of feeling tone in type faces. Journal of Applied Psychology17(6), 742–764 (1933)

  7. [7]

    In: International conference on machine learning

    Desai, K., Nickel, M., Rajpurohit, T., Johnson, J., Vedantam, R.: Hyperbolic image-text representations. In: International conference on machine learning. pp. 7694–7731 (2023)

  8. [8]

    In: Computer vision and pattern recognition

    Ge, S., Mishra, S., Kornblith, S., Li, C.L., Jacobs, D.: Hyperbolic contrastive learn- ing for visual representations beyond objects. In: Computer vision and pattern recognition. pp. 6840–6849 (2023)

  9. [9]

    Gonzalez-Jimenez, A., Lionetti, S., Amruthalingam, L., Gottfrois, P., Gröger, F., Pouly, M., Navarini, A.A.: Is hyperbolic space all you need for medical anomaly de- tection? In: International Conference on Medical Image Computing and Computer- Assisted Intervention. pp. 312–322 (2025)

  10. [10]

    In: Computer vision and pattern recognition

    He, K., et al.: Deep residual learning for image recognition. In: Computer vision and pattern recognition. pp. 770–778 (2016)

  11. [11]

    Journal of Marketing68(4), 60–72 (2004)

    Henderson, P.W., Giese, J.L., Cote, J.A.: Impression management using typeface design. Journal of Marketing68(4), 60–72 (2004)

  12. [12]

    IEEE Access12, 154811–154822 (2024)

    Izumi, K., Yanai, K.: CLIPFontDraw: Stylizing fonts with CLIP. IEEE Access12, 154811–154822 (2024)

  13. [13]

    In: ACM Transactions on Information Systems

    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. In: ACM Transactions on Information Systems. pp. 422–446 (2002)

  14. [14]

    In: Proceedings of the 28th International Conference on Multimedia Modeling (MMM)

    Kang, J., Haraguchi, D., Matsuda, S., Kimura, A., Uchida, S.: Shared latent space of font shapes and their noisy impressions. In: Proceedings of the 28th International Conference on Multimedia Modeling (MMM). pp. 146–157 (2022)

  15. [15]

    In: Proceedings of the 27th European Conference on Artificial Intelligence (ECAI)

    Kang, L., Yang, F., Wang, K., Souibgui, M.A., Gomez, L., Fornés, A., Valveny, E., Karatzas, D.: GRIF-DM: Generation of rich impression fonts using diffusion models. In: Proceedings of the 27th European Conference on Artificial Intelligence (ECAI). pp. 226–233 (2024)

  16. [16]

    In: Proceedings of the 18th International Con- ference on Document Analysis and Recognition (ICDAR)

    Kubota, Y., Haraguchi, D., Uchida, S.: Impression-CLIP: Contrastive shape- impression embedding for fonts. In: Proceedings of the 18th International Con- ference on Document Analysis and Recognition (ICDAR). pp. 70–85 (2024)

  17. [17]

    In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV) Workshops

    Kubota, Y., Uchida, S.: Embedding font impression word tags based on co- occurrence. In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV) Workshops. pp. 3514–3523 (October 2025) Hierarchical Co-Embedding of Font Shapes and Impression Tags 17

  18. [18]

    In: Proceedings of the 28th ACM International Conference on Multimedia (MM)

    Kulahcioglu, T., De Melo, G.: Fonts like this but happier: A new way to discover fonts. In: Proceedings of the 28th ACM International Conference on Multimedia (MM). pp. 2973–2981 (2020)

  19. [19]

    In: Computer vision and pattern recognition

    Li, H., Chen, Z., Xu, Y., Hu, J.: Hyperbolic anomaly detection. In: Computer vision and pattern recognition. pp. 17511–17520 (2024)

  20. [20]

    In: International conference on computer vision

    Li, J., Wang, J., Tan, C., Lian, N., Chen, L., Wang, Y., Zhang, M., Xia, S.T., Chen, B.: Enhancing partially relevant video retrieval with hyperbolic learning. In: International conference on computer vision. pp. 23074–23084 (2025)

  21. [21]

    In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

    Li, Y., Suen, C.Y.: Typeface personality traits and their design characteristics. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems. pp. 231–238 (2010)

  22. [22]

    In: Proceedings of the 16th International Conference on Docu- ment Analysis and Recognition (ICDAR)

    Matsuda, S., Kimura, A., Uchida, S.: Impressions2Font: Generating fonts by spec- ifying impressions. In: Proceedings of the 16th International Conference on Docu- ment Analysis and Recognition (ICDAR). pp. 739–754 (2021)

  23. [23]

    Matsuda, S., Kimura, A., Uchida, S.: Font generation with missing impression labels.In:Proceedingsofthe26thInternationalConferenceonPatternRecognition (ICPR). pp. 1400–1406 (2022)

  24. [24]

    Educa- tional Communication and Technology34(4), 235–244 (1986)

    Morrison, G.R.: Communicability of the emotional connotation of type. Educa- tional Communication and Technology34(4), 235–244 (1986)

  25. [25]

    In: Advances in neural information processing systems (2017)

    Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representa- tions. In: Advances in neural information processing systems (2017)

  26. [26]

    In: International conference on machine learning

    Nickel, M., Kiela, D.: Learning continuous hierarchies in the lorentz model of hy- perbolic geometry. In: International conference on machine learning. pp. 3779–3788 (2018)

  27. [27]

    ACM Transactions on Graphics (TOG)33(4), 1–9 (2014)

    O’Donovan,P.,L¯ ıbeks,J.,Agarwala,A.,Hertzmann, A.:Exploratoryfontselection using crowdsourced attributes. ACM Transactions on Graphics (TOG)33(4), 1–9 (2014)

  28. [28]

    In: International conference on learning representations (2025)

    Pal, A., Van Spengler, M., di Melendugno, G.M.D., Flaborea, A., Galasso, F., Mettes, P.: Compositional entailment learning for hyperbolic vision-language mod- els. In: International conference on learning representations (2025)

  29. [29]

    Journal of Applied Psychology7(4), 312–329 (1923)

    Poffenberger, A.T., Franken, R.B.: A study of the appropriateness of type faces. Journal of Applied Psychology7(4), 312–329 (1923)

  30. [30]

    In: Proceedings of the 38th International Conference on Machine Learning (ICML)

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning (ICML). pp. 8748–8763 (2021)

  31. [31]

    In: International conference on machine learning

    Sala, F., De Sa, C., Gu, A., Ré, C.: Representation tradeoffs for hyperbolic em- beddings. In: International conference on machine learning. pp. 4460–4469 (2018)

  32. [32]

    Usability News8(1), 1–6 (2006)

    Shaikh, A.D., Chaparro, B.S., Fox, D.: Perception of fonts: Perceived personality traits and uses. Usability News8(1), 1–6 (2006)

  33. [33]

    Journal of Sensory Studies35(5), e12599 (2020)

    de Sousa, M.M.M., Carvalho, F.M., Pereira, R.G.F.A.: Do typefaces of packaging labels influence consumers’ perception of specialty coffee? A preliminary study. Journal of Sensory Studies35(5), e12599 (2020)

  34. [34]

    Computer Graphics Forum (CGF)43(2), e15043 (2024)

    Tatsukawa, Y., Shen, I.C., Qi, A., Koyama, Y., Igarashi, T., Shamir, A.: FontCLIP: A semantic typography visual-language model for multilingual font applications. Computer Graphics Forum (CGF)43(2), e15043 (2024)

  35. [35]

    In: Advances in neural information processing systems (2017)

    Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems (2017)