pith. sign in

arxiv: 2604.20310 · v1 · submitted 2026-04-22 · 💻 cs.HC

Odor Maps from the LLM-derived similarity scores

Pith reviewed 2026-05-09 23:46 UTC · model grok-4.3

classification 💻 cs.HC
keywords odor mapslarge language modelsessential oilssimilarity scoreshuman evaluationDravnieks datasetolfactory spacepairwise distances
0
0 comments X

The pith

Large language models derive odor similarity scores that align with human data enough to map essential oils by group.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper checks whether large language models can estimate how similar different odors feel by computing distances between their text descriptions. These LLM distances are compared to distances already measured in the established Dravnieks odor character dataset, and the two sets show statistical agreement. The authors then apply the same distance method to the names of essential oils and build a two-dimensional map in which oils from the same group sit close together. This result indicates that LLM-derived scores can stand in for human perceptual judgments when constructing odor maps.

Core claim

Statistical comparison revealed that LLMs can infer odor similarity to some degree, suggesting the potential of odor maps generated from these similarity data. Applying this approach, we generated an odor map of essential oils. It demonstrates that essential oils within the same group are closely located in the odor map, suggesting that the proximity in the odor map corresponds to human evaluation.

What carries the argument

LLM-derived pairwise distances between odor descriptors or names, used as proxies to embed odors into a map where spatial closeness reflects similarity.

If this is right

  • Any new collection of odor names can be placed on a map using only language-model queries without additional human testing.
  • Clustering of same-group items on the map can serve as an initial screen for whether a set of odors matches expected human categories.
  • The method supplies a scalable starting point for exploring smell spaces before committing to full sensory panels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the maps prove stable, they could be used to suggest fragrance substitutions by recommending oils that sit near a target on the map.
  • The same distance technique might be applied to text descriptions of individual chemical compounds to link molecular names to perceived smells.
  • The approach invites direct comparison with maps built from other data sources such as gas-chromatography profiles or molecular descriptors.

Load-bearing premise

Agreement between LLM distances and the Dravnieks human dataset is strong enough that the resulting map of essential oils will match new human judgments of those same oils.

What would settle it

A new experiment that collects fresh human similarity ratings for pairs of the essential oils and checks whether map distances predict those ratings at a level clearly above chance.

Figures

Figures reproduced from arXiv: 2604.20310 by Manabu Okumura, Manuel Aleixandre, Takamichi Nakamoto, Yuki Harada.

Figure 1
Figure 1. Figure 1: Two-dimensional mapping of odor names using MDS, based on (a) cosine distance calculated from the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example prompt to generate similarity scores; between ’cis-3-hexenol’ and ’beta-ionone’ [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparing Dravnieks-dataset Cosine Distances and GPT-4o-mini Similarity; (a) A scatter plot displays the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Heatmap visualization of Mantel statistical test results across the seven metrics for (a) the 146 odor descriptors [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Stress and the MDS dimension (n-components) based on seven different methods, (a) for 146 odor descriptors, [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Result of LLM-derived similarity scores for essential oils; (a) heatmap visualization of statistical test results [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of odor space based on pairwise similarities derived from GPT-4o-mini. (a) Odor map: 2D [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
read the original abstract

The application of large language models (LLMs) to OdorSpace analysis attracts growing interest. Recent studies have explored the comparison of sensory evaluation spaces derived from LLMs with odor character profiles in the Dravnieks' dataset. In this study, we calculated pairwise distances of odor descriptors using three distance measures and statistically compared these LLM-derived similarities with distances derived from the original data. Next, we extended this approach to odor names (ingredients). Statistical comparison revealed that LLMs can infer odor similarity to some degree, suggesting the potential of odor maps generated from these similarity data. Applying this approach, we generated an odor map of essential oils. It demonstrates that essential oils within the same group are closely located in the odor map, suggesting that the proximity in the odor map corresponds to human evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper explores using LLMs to derive pairwise similarity scores for odor descriptors via three distance measures, statistically comparing these to human distances in the Dravnieks dataset. It extends the method to odor ingredients and generates an odor map of essential oils, where oils from the same predefined groups cluster closely, interpreted as indicating that map proximity corresponds to human perceptual evaluation.

Significance. If the LLM-derived similarities prove to reliably match human perceptual distances and the map distances predict held-out human judgments, this could enable scalable, low-cost generation of odor maps for applications in perfumery, food science, and sensory research, reducing reliance on large human panels.

major comments (3)
  1. [Abstract] Abstract: The claim that 'statistical comparison revealed that LLMs can infer odor similarity to some degree' provides no quantitative results (e.g., correlation coefficients, effect sizes, p-values, sample sizes, or details on the three distance measures and multiple-testing corrections), which is load-bearing for assessing whether the agreement with Dravnieks data is strong enough to support downstream claims.
  2. [Results] Results (odor map of essential oils): The observation that 'essential oils within the same group are closely located' does not establish that proximity corresponds to human evaluation, because the groups are not shown to be defined by independent human perceptual data, and no held-out human similarity ratings for the oils are collected to test whether Euclidean distances on the map predict those ratings.
  3. [Methods] Methods: The manuscript does not specify the LLM model, prompt templates, exact similarity computation from LLM outputs, or the embedding algorithm used to generate the odor map (e.g., MDS parameters), preventing assessment of reproducibility and potential confounds in the distance measures.
minor comments (1)
  1. [Abstract] Abstract: The phrasing 'odor names (ingredients)' is unclear without examples or a definition of how ingredients differ from descriptors in the analysis.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below, along with indications of revisions to be made in the updated version.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that 'statistical comparison revealed that LLMs can infer odor similarity to some degree' provides no quantitative results (e.g., correlation coefficients, effect sizes, p-values, sample sizes, or details on the three distance measures and multiple-testing corrections), which is load-bearing for assessing whether the agreement with Dravnieks data is strong enough to support downstream claims.

    Authors: We concur that quantitative metrics are necessary to substantiate the claim in the abstract. The revised manuscript will include the relevant statistical details, including correlation coefficients, p-values, sample sizes, and specifics on the distance measures and any corrections applied for multiple comparisons. revision: yes

  2. Referee: [Results] Results (odor map of essential oils): The observation that 'essential oils within the same group are closely located' does not establish that proximity corresponds to human evaluation, because the groups are not shown to be defined by independent human perceptual data, and no held-out human similarity ratings for the oils are collected to test whether Euclidean distances on the map predict those ratings.

    Authors: The groups are based on standard classifications from the perfumery literature that reflect human sensory consensus. We will expand the text to cite the origins of these groupings and note their perceptual basis. We agree that held-out validation would strengthen the claim but note that our study did not collect new human data for the oils; we will add a discussion of this as a limitation and direction for future research. revision: partial

  3. Referee: [Methods] Methods: The manuscript does not specify the LLM model, prompt templates, exact similarity computation from LLM outputs, or the embedding algorithm used to generate the odor map (e.g., MDS parameters), preventing assessment of reproducibility and potential confounds in the distance measures.

    Authors: We appreciate this feedback on reproducibility. The revised methods section will detail the LLM model, full prompt templates, the procedure for extracting similarity scores and computing distances, and the MDS embedding parameters including dimensionality and optimization criteria. revision: yes

standing simulated objections not resolved
  • Collecting new held-out human similarity ratings for the essential oils to directly validate map distances would require a separate human study, which we cannot undertake as part of this revision.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives LLM pairwise distances on odor descriptors, compares them statistically to an external human dataset (Dravnieks), then re-uses the LLM distance method on essential-oil names to produce an embedding map. Predefined groups are observed to cluster, but this is an empirical observation rather than a fitted prediction or self-referential definition. No parameters are tuned on the essential-oil data itself, no self-citations carry the central claim, and the chain does not reduce to its own inputs by construction. The process is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract. The work relies on standard LLM prompting and distance calculations whose details are not supplied.

pith-pipeline@v0.9.0 · 5435 in / 1219 out tokens · 55453 ms · 2026-05-09T23:46:29.681042+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Predicting human olfactory perception from chemical features of odor molecules.Science, 355(6327):820–826, 2017

    Andreas Keller, Richard C Gerkin, Yuanfang Guan, Amit Dhurandhar, Gabor Turu, Bence Szalai, Joel D Mainland, Yusuke Ihara, Chung Wen Yu, Russ Wolfinger, et al. Predicting human olfactory perception from chemical features of odor molecules.Science, 355(6327):820–826, 2017

  2. [2]

    Predicting odor pleasantness from odorant structure: pleasantness as a reflection of the physical world.Journal of Neuroscience, 27(37):10015–10023, 2007

    Rehan M Khan, Chung-Hay Luk, Adeen Flinker, Amit Aggarwal, Hadas Lapid, Rafi Haddad, and Noam Sobel. Predicting odor pleasantness from odorant structure: pleasantness as a reflection of the physical world.Journal of Neuroscience, 27(37):10015–10023, 2007

  3. [3]

    Predicting natural language descriptions of mono-molecular odorants.Nature communications, 9(1):4979, 2018

    E Darío Gutiérrez, Amit Dhurandhar, Andreas Keller, Pablo Meyer, and Guillermo A Cecchi. Predicting natural language descriptions of mono-molecular odorants.Nature communications, 9(1):4979, 2018

  4. [4]

    Predicting odor perceptual similarity from odor structure.PLoS computational biology, 9(9):e1003184, 2013

    Kobi Snitz, Adi Yablonka, Tali Weiss, Idan Frumin, Rehan M Khan, and Noam Sobel. Predicting odor perceptual similarity from odor structure.PLoS computational biology, 9(9):e1003184, 2013

  5. [5]

    A measure of smell enables the creation of olfactory metamers.Nature, 588(7836):118–123, 2020

    Aharon Ravia, Kobi Snitz, Danielle Honigstein, Maya Finkel, Rotem Zirler, Ofer Perl, Lavi Secundo, Christophe Laudamiel, David Harel, and Noam Sobel. A measure of smell enables the creation of olfactory metamers.Nature, 588(7836):118–123, 2020

  6. [6]

    High-fidelity tuning of olfactory mixture distances in the perceptual space of smell through a community effort.biorxiv, pages 2025–12, 2025

    Vahid Satarifard, Laura Sisson, Yikun Han, Pedro Ilídio, Matej Hladiš, Maxence Lalis, Xuebo Song, Wenjie Yin, Aharon Ravia, CiCi Xingyu Zheng, et al. High-fidelity tuning of olfactory mixture distances in the perceptual space of smell through a community effort.biorxiv, pages 2025–12, 2025

  7. [7]

    Detection and evaluation of fragrances by human reactions using a chemical sensor based on adsorbate detection.Analytical chemistry, 65(6):673–677, 1993

    Kenji Yokoyama and Fumihiro Ebisawa. Detection and evaluation of fragrances by human reactions using a chemical sensor based on adsorbate detection.Analytical chemistry, 65(6):673–677, 1993

  8. [8]

    Artificial odor-recognition system using neural network for estimating sensory quantities of blended fragrance.Sensors and Actuators A: Physical, 57(1):65–71, 1996

    Shinichi Hanaki, Takamichi Nakamoto, and Toyosaka Moriizumi. Artificial odor-recognition system using neural network for estimating sensory quantities of blended fragrance.Sensors and Actuators A: Physical, 57(1):65–71, 1996

  9. [9]

    Odrp: A deep learning framework for odor descriptor rating prediction using electronic nose.IEEE Sensors Journal, 21(13):15012–15021, 2021

    Juan Guo, Yu Cheng, Dehan Luo, Kin-Yeung Wong, Kevin Hung, and Xin Li. Odrp: A deep learning framework for odor descriptor rating prediction using electronic nose.IEEE Sensors Journal, 21(13):15012–15021, 2021. 8 Odor Maps from the LLM-derived similarity scoresA PREPRINT

  10. [10]

    Odor impression prediction from mass spectra.PLoS One, 11(6):e0157030, 2016

    Yuji Nozaki and Takamichi Nakamoto. Odor impression prediction from mass spectra.PLoS One, 11(6):e0157030, 2016

  11. [11]

    Correction: Predictive modeling for odor character of a chemical using machine learning combined with natural language processing.Plos one, 13(12):e0208962, 2018

    Yuji Nozaki and Takamichi Nakamoto. Correction: Predictive modeling for odor character of a chemical using machine learning combined with natural language processing.Plos one, 13(12):e0208962, 2018

  12. [12]

    Automatic scent creation by cheminformatics method.Scientific Reports, 14(1):31284, 2024

    Manuel Aleixandre, Dani Prasetyawan, and Takamichi Nakamoto. Automatic scent creation by cheminformatics method.Scientific Reports, 14(1):31284, 2024

  13. [13]

    Generative diffusion network for creating scents.IEEE Access, 2025

    Manuel Aleixandre, Dani Prasetyawan, and Takamichi Nakamoto. Generative diffusion network for creating scents.IEEE Access, 2025

  14. [14]

    Representations of smells: The next frontier for language models?Cognition, 264:106243, 2025

    Murathan Kurfalı, Pawel Herman, Stephen Pierzchajlo, Jonas Olofsson, and Thomas Hörberg. Representations of smells: The next frontier for language models?Cognition, 264:106243, 2025

  15. [15]

    Visual-olfactory display using olfactory sensory map

    Aiko Nambu, Takuji Narumi, Kunihiro Nishimura, Tomohiro Tanikawa, and Michitaka Hirose. Visual-olfactory display using olfactory sensory map. In2010 IEEE Virtual Reality Conference (VR), pages 39–42. IEEE, 2010

  16. [16]

    Manuel Zarzo and David T Stanton. Understanding the underlying dimensions in perfumers’ odor perception space as a basis for developing meaningful odor maps.Attention, Perception, & Psychophysics, 71(2):225–247, 2009

  17. [17]

    John Wiley & Sons, 1984

    William R Dillon and Matthew Goldstein.Multivariate analysis: Methods and applications. John Wiley & Sons, 1984

  18. [18]

    ASTM International, 02 1992

    Andrew Dravnieks.Atlas of Odor Character Profiles. ASTM International, 02 1992. ISBN 978-0-8031-0456-3. doi:10.1520/DS61-EB. URLhttps://doi.org/10.1520/DS61-EB

  19. [19]

    Pyrfume: A window to the world’s olfactory data.bioRxiv, pages 2022–09, 2022

    Jason B Castro, Travis J Gould, Robert Pellegrino, Zhiwei Liang, Liyah A Coleman, Famesh Patel, Derek S Wallace, Tanushri Bhatnagar, Joel D Mainland, and Richard C Gerkin. Pyrfume: A window to the world’s olfactory data.bioRxiv, pages 2022–09, 2022

  20. [20]

    Pyrfume repository, 2020

    Pyrfume Project. Pyrfume repository, 2020. URLhttps://github.com/

  21. [21]

    URL https://scikit-learn.org/stable/modules/generated/sklearn

    scikit-learn developers, 2007. URL https://scikit-learn.org/stable/modules/generated/sklearn. manifold.MDS.html

  22. [22]

    URL https://docs.scipy.org/doc/scipy/reference/cluster.hierarchy

    SciPy community, 2008. URL https://docs.scipy.org/doc/scipy/reference/cluster.hierarchy. html

  23. [23]

    URLhttps://pypi.org/project/langchain/

    Python Software Foundation, 2026. URLhttps://pypi.org/project/langchain/

  24. [24]

    URL https://scikit.bio/docs/dev/generated/skbio.stats

    scikit-bio development team., 2014. URL https://scikit.bio/docs/dev/generated/skbio.stats. distance.mantel.html

  25. [25]

    Elsevier, 2025

    Takamichi Nakamoto.Digital Technologies in Olfaction: Fundamentals to Applications. Elsevier, 2025. 9