pith. machine review for the scientific record. sign in

arxiv: 2604.18296 · v1 · submitted 2026-04-20 · 💻 cs.CL

Recognition: unknown

Exploring Concreteness Through a Figurative Lens

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:26 UTC · model grok-4.3

classification 💻 cs.CL
keywords concretenessfigurative languageLLM representationslayer-wise analysisgeometric directionmetaphorrepresentation spacesteering generation
0
0 comments X

The pith

LLMs encode concreteness as one consistent one-dimensional direction in their mid-to-late layers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates how large language models internally represent word concreteness, which can shift between literal and figurative senses depending on context. It shows that early layers separate literal from figurative uses of the same noun while mid-to-late layers compress the distinction into a single direction shared across model families. A sympathetic reader would care because this simple geometric organization turns a complex semantic property into something directly usable for classifying figurative language and adjusting generation without retraining.

Core claim

The authors demonstrate that LLMs separate literal and figurative usage in early layers, and that mid-to-late layers compress concreteness into a one-dimensional direction that is consistent across models. This geometric structure supports efficient figurative-language classification and enables training-free steering of generation toward more literal or more figurative rewrites.

What carries the argument

The one-dimensional concreteness direction in hidden representation space that organizes literal versus figurative interpretations of nouns.

If this is right

  • Early layers perform the separation between literal and figurative contexts.
  • A single direction vector enables training-free classification of figurative language.
  • Manipulating the direction allows steering of generated text toward literal or figurative styles.
  • The structure remains consistent across four different model families.
  • This geometry provides a practical handle on context-dependent semantics in representation space.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same low-dimensional compression might appear for other shifting semantic properties such as specificity or sentiment.
  • Identifying such directions could supply lightweight interpretability tools for controlling stylistic output in deployed systems.
  • The pattern may generalize to additional figurative phenomena like idioms or sarcasm if tested on broader datasets.
  • If the direction proves stable, it could reduce reliance on supervised fine-tuning for style control tasks.

Load-bearing premise

The observed one-dimensional direction genuinely reflects the model's internal handling of concreteness rather than an artifact of training data statistics or the specific choice of nouns and contexts examined.

What would settle it

If the same direction vector fails to reliably classify new literal-versus-figurative examples or to steer generation in the predicted direction across additional word sets or models, the claim would be falsified.

Figures

Figures reproduced from arXiv: 2604.18296 by Saptarshi Ghosh, Tianyu Jiang.

Figure 1
Figure 1. Figure 1: Layer-wise Pearson correlation between static [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Layer-wise AUROC for separating high and [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Word frequency distribution in the 25,000 [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Mean δ across layers in Llama-3.1-8B, for verbs. Early high separation is followed by moderate to low separation in the middle to later layers [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: AUROC score for classifying high and low [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Prompt for generating contextual concreteness [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Prompt for generating static concreteness [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Pearson correlation between embeddings of models and concreteness scores from [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Mean δ for all layers for remaining models. celeration (multiple NVIDIA A100 GPUs) as well as substantial CPU time for mathematical opera￾tions such as DiffMean analysis. Runtime is a prac￾tical constraint: for example, layer-wise MLP prob￾ing with 10-fold cross-validation on larger models can take several hours to complete [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: AUROC score for classifying high and low concrete nouns using one-directional geometric subspace for [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Prompt for generating static concreteness token. [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Annotation guidelines [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗
read the original abstract

Static concreteness ratings are widely used in NLP, yet a word's concreteness can shift with context, especially in figurative language such as metaphor, where common concrete nouns can take abstract interpretations. While such shifts are evident from context, it remains unclear how LLMs understand concreteness internally. We conduct a layer-wise and geometric analysis of LLM hidden representations across four model families, examining how models distinguish literal vs figurative uses of the same noun and how concreteness is organized in representation space. We find that LLMs separate literal and figurative usage in early layers, and that mid-to-late layers compress concreteness into a one-dimensional direction that is consistent across models. Finally, we show that this geometric structure is practically useful: a single concreteness direction supports efficient figurative-language classification and enables training-free steering of generation toward more literal or more figurative rewrites.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript conducts a layer-wise geometric analysis of LLM hidden representations to examine how models encode concreteness in literal versus figurative uses of the same nouns. Across four model families, it reports that early layers separate literal and figurative usages while mid-to-late layers compress concreteness into a consistent one-dimensional direction; this direction is then shown to support training-free figurative-language classification and steering of generation toward more literal or figurative outputs.

Significance. If the geometric claims hold after controls, the work advances interpretability by identifying a compressed, cross-model axis for a context-dependent semantic property. The practical demonstrations of classification and steering add applied value, and the consistency finding (if not reducible to stimulus artifacts) would be a useful benchmark for representation geometry studies.

major comments (2)
  1. [§4.2] §4.2 (Cross-model consistency analysis): the reported one-dimensional concreteness direction and its alignment across models may be driven by co-occurrence statistics in the paired literal/figurative noun stimuli rather than an internal model property. Because the same nouns appear in both conditions, any systematic difference in training-data contexts (syntactic frames, collocates, or sentiment) can induce a spurious linear direction; without ablations on unmatched nouns, frequency-matched controls, or out-of-distribution items, the generality claim is not yet load-bearing.
  2. [§3.3] §3.3 (Stimulus and direction extraction): the method for extracting the concreteness direction (difference vectors, linear probes, or PCA) is not shown to be invariant to the specific choice of figurative contexts. The central compression claim requires evidence that the direction remains stable when the literal/figurative contrast is decorrelated from lexical identity; the current paired design leaves this open.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by reporting at least one key quantitative result (e.g., classification accuracy or steering success rate with error bars) rather than qualitative descriptions alone.
  2. [Figures] Figure captions for the layer-wise plots should explicitly state the number of stimuli, models, and the precise metric used to measure 'consistency' of the direction (e.g., cosine similarity threshold or alignment score).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which raises valid points about potential stimulus confounds in our geometric analysis. We address each major comment below and commit to revisions that strengthen the claims without overstating current evidence.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (Cross-model consistency analysis): the reported one-dimensional concreteness direction and its alignment across models may be driven by co-occurrence statistics in the paired literal/figurative noun stimuli rather than an internal model property. Because the same nouns appear in both conditions, any systematic difference in training-data contexts (syntactic frames, collocates, or sentiment) can induce a spurious linear direction; without ablations on unmatched nouns, frequency-matched controls, or out-of-distribution items, the generality claim is not yet load-bearing.

    Authors: We agree that the paired-noun design, while controlling for lexical identity, permits possible co-occurrence confounds. The paired approach was selected to isolate context-driven concreteness shifts for identical nouns, following standard practices in figurative-language studies. Cross-model alignment of the direction offers partial support against pure artifacts, as training-data differences across families make identical spurious directions unlikely. To directly test this, we will add ablations using unmatched nouns (frequency- and concreteness-matched but unpaired) and out-of-distribution items, reporting whether the 1D direction and its cross-model consistency persist. These controls will be included in the revised manuscript. revision: yes

  2. Referee: [§3.3] §3.3 (Stimulus and direction extraction): the method for extracting the concreteness direction (difference vectors, linear probes, or PCA) is not shown to be invariant to the specific choice of figurative contexts. The central compression claim requires evidence that the direction remains stable when the literal/figurative contrast is decorrelated from lexical identity; the current paired design leaves this open.

    Authors: The difference-vector method was chosen precisely to decorrelate usage from noun identity within each pair. We acknowledge that full decorrelation from lexical identity requires additional tests. In revision we will add an invariance check: extract the direction from one set of figurative contexts, then evaluate classification and steering performance on held-out figurative contexts and unmatched nouns. We will also compare difference vectors against PCA and linear-probe variants to demonstrate method robustness. These results will be reported to support the compression claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical analysis of representations

full rationale

The paper conducts a layer-wise geometric analysis of existing LLM hidden states on literal vs. figurative noun pairs, reporting an observed one-dimensional direction in mid-to-late layers without any equations, derivations, or fitted parameters that reduce the claimed structure to its inputs by construction. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the provided abstract or description; the consistency across models and downstream utility are presented as empirical observations rather than constructed results. The work remains self-contained as an analysis of model behavior on the chosen stimuli.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on the standard domain assumption that hidden representations encode semantic distinctions such as concreteness, but introduces no free parameters, new axioms, or invented entities.

axioms (1)
  • domain assumption Hidden representations in LLMs encode semantic properties including concreteness and its contextual shifts
    This assumption underpins the layer-wise and geometric analysis described in the abstract.

pith-pipeline@v0.9.0 · 5435 in / 1260 out tokens · 40841 ms · 2026-05-10T05:26:55.790759+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

100 extracted references · 49 canonical work pages · 8 internal anchors

  1. [2]

    Behavior Research Methods , volume=

    Concreteness ratings for 40 thousand generally known English word lemmas , author=. Behavior Research Methods , volume=. 2014 , publisher=

  2. [3]

    A Structural Probe for Finding Syntax in Word Representations

    Hewitt, John and Manning, Christopher D. A Structural Probe for Finding Syntax in Word Representations. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1419

  3. [4]

    Information-Theoretic Probing for Linguistic Structure

    Pimentel, Tiago and Valvoda, Josef and Maudslay, Rowan Hall and Zmigrod, Ran and Williams, Adina and Cotterell, Ryan. Information-Theoretic Probing for Linguistic Structure. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.420

  4. [5]

    Estimating Word Concreteness from Contextualized Embeddings

    Wartena, Christian. Estimating Word Concreteness from Contextualized Embeddings. Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024). 2024

  5. [6]

    Don ' t Invite BERT to Drink a Bottle: Modeling the Interpretation of Metonymies Using BERT and Distributional Representations

    Pedinotti, Paolo and Lenci, Alessandro. Don ' t Invite BERT to Drink a Bottle: Modeling the Interpretation of Metonymies Using BERT and Distributional Representations. Proceedings of the 28th International Conference on Computational Linguistics. 2020. doi:10.18653/v1/2020.coling-main.602

  6. [8]

    2025 , eprint=

    Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs , author=. 2025 , eprint=

  7. [9]

    2024 , eprint=

    Linearity of Relation Decoding in Transformer Language Models , author=. 2024 , eprint=

  8. [10]

    Pollock, Lewis , year =

  9. [12]

    Behavioral and Brain Sciences , volume=

    Perceptual symbol systems , author=. Behavioral and Brain Sciences , volume=. 1999 , publisher=

  10. [13]

    2024 , eprint=

    The Llama 3 Herd of Models , author=. 2024 , eprint=

  11. [14]

    2025 , eprint=

    Qwen3 Technical Report , author=. 2025 , eprint=

  12. [15]

    2025 , eprint=

    gpt-oss-120b gpt-oss-20b Model Card , author=. 2025 , eprint=

  13. [16]

    2024 , eprint=

    Gemma 2: Improving Open Language Models at a Practical Size , author=. 2024 , eprint=

  14. [19]

    Bonin, P. and M. Concreteness Norms for 1,659 French Words: Relationships with Other Psycholinguistic Variables and Word Recognition Times , journal =. 2018 , url=

  15. [20]

    and Ferr

    Guasch, M. and Ferr. Spanish Norms for Affective and Lexico-Semantic Variables for 1,400 Words , journal =. 2016 , volume =

  16. [22]

    Behavior research methods , volume=

    Concreteness ratings for 62,000 English multiword expressions , author=. Behavior research methods , volume=. 2023 , publisher=

  17. [23]

    Contextual Characteristics of Concrete and Abstract Words

    Frassinelli, Diego and Naumann, Daniela and Utt, Jason and Schulte m Walde, Sabine. Contextual Characteristics of Concrete and Abstract Words. Proceedings of the 12th International Conference on Computational Semantics ( IWCS ) --- Short papers. 2017

  18. [24]

    Defining a Conceptual Topography of Word Concreteness: Clustering Properties of Emotion, Sensation, and Magnitude among 750 English Words , volume =

    Troche, Joshua and Crutch, Sebastian and Reilly, Jamie , year =. Defining a Conceptual Topography of Word Concreteness: Clustering Properties of Emotion, Sensation, and Magnitude among 750 English Words , volume =. Frontiers in Psychology , doi =

  19. [31]

    Cross-Lingual Metaphor Detection Using Common Semantic Features

    Tsvetkov, Yulia and Mukomel, Elena and Gershman, Anatole. Cross-Lingual Metaphor Detection Using Common Semantic Features. Proceedings of the First Workshop on Metaphor in NLP. 2013

  20. [33]

    A Robustly Optimized BERT Pre-training Approach with Post-training

    Zhuang, Liu and Wayne, Lin and Ya, Shi and Jun, Zhao. A Robustly Optimized BERT Pre-training Approach with Post-training. Proceedings of the 20th Chinese National Conference on Computational Linguistics. 2021

  21. [35]

    Communications Psychology , year=

    A multimodal transformer-based tool for automatic generation of concreteness ratings across languages , author=. Communications Psychology , year=

  22. [37]

    2025 , eprint=

    The Representational Alignment between Humans and Language Models is implicitly driven by a Concreteness Effect , author=. 2025 , eprint=

  23. [38]

    2024 , eprint=

    The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets , author=. 2024 , eprint=

  24. [39]

    2020 , eprint=

    EPIE Dataset: A Corpus For Possible Idiomatic Expressions , author=. 2020 , eprint=

  25. [40]

    MAGPIE : A Large Corpus of Potentially Idiomatic Expressions

    Haagsma, Hessel and Bos, Johan and Nissim, Malvina. MAGPIE : A Large Corpus of Potentially Idiomatic Expressions. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

  26. [42]

    2010 , url =

    A Method for Linguistic Metaphor Identification: From MIP to MIPVU , author =. 2010 , url =

  27. [43]

    Gregori, Lorenzo and Montefinese, Maria and Radicioni, Daniele and Ravelli, Andrea Amelio and Varvara, Rossella , year =

  28. [44]

    ABRICOT - AB st R actness and Inclusiveness in CO ntex T : A CALAMITA Challenge

    Puccetti, Giovanni and Collacciani, Claudia and Ravelli, Andrea Amelio and Esuli, Andrea and Bolognesi, Marianna. ABRICOT - AB st R actness and Inclusiveness in CO ntex T : A CALAMITA Challenge. Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024). 2024

  29. [51]

    2023 , eprint=

    Augmented Language Models: a Survey , author=. 2023 , eprint=

  30. [52]

    2020 , eprint=

    Language Models are Few-Shot Learners , author=. 2020 , eprint=

  31. [53]

    2023 , eprint=

    Prompting Large Language Model for Machine Translation: A Case Study , author=. 2023 , eprint=

  32. [54]

    Metaphors We Live By , year =

    George Lakoff and Mark Johnson , publisher =. Metaphors We Live By , year =

  33. [55]

    Metaphor and Thought , editor =

    Lakoff, George , title =. Metaphor and Thought , editor =. 1993 , url=

  34. [56]

    The poetics of mind: Figurative thought, language, and understanding: Raymond W

    Shen, Yeshayahu , year =. The poetics of mind: Figurative thought, language, and understanding: Raymond W. Gibbs, Jr., New York: Cambridge University Press, 1994. (ix + 527 pp.). ISBN 0-521-41965-4 (hb.), 0-521-42992-7 (pb.). 59.95 (hb.), 18.95 (pb.) , volume =

  35. [57]

    1993 , isbn =

    Idioms: Processing, Structure, and Interpretation , publisher =. 1993 , isbn =

  36. [58]

    2001 , isbn =

    Glucksberg, Sam , title =. 2001 , isbn =

  37. [59]

    Metaphor and Metonymy at the Crossroads: A Cognitive Perspective

    Barcelona, Antonio. Metaphor and Metonymy at the Crossroads: A Cognitive Perspective. 2003

  38. [60]

    1999 , isbn =

    Metonymy in Language and Thought , series =. 1999 , isbn =

  39. [61]

    Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings , url=

    Ljubešić, Nikola and Fišer, Darja and Peti-Stantić, Anita , year=. Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings , url=. doi:10.18653/v1/w18-3028 , booktitle=

  40. [65]

    2026 , eprint=

    MetFuse: Figurative Fusion between Metonymy and Metaphor , author=. 2026 , eprint=

  41. [66]

    2026 , eprint=

    Rhetorical Questions in LLM Representations: A Linear Probing Study , author=. 2026 , eprint=

  42. [67]

    Antonio Barcelona. 2003. Metaphor and Metonymy at the Crossroads: A Cognitive Perspective. Mouton de Gruyter

  43. [68]

    Lawrence W Barsalou. 1999. https://doi.org/10.1017/S0140525X99002149 Perceptual symbol systems . Behavioral and Brain Sciences, 22(4):577--660

  44. [69]

    Beata Beigman Klebanov, Chee Wee Leong, and Michael Flor. 2015. https://doi.org/10.3115/v1/W15-1402 Supervised word-level metaphor detection: Experiments with concreteness and reweighting of examples . In Proceedings of the Third Workshop on Metaphor in NLP , pages 11--20, Denver, Colorado. Association for Computational Linguistics

  45. [70]

    Bonin, A

    P. Bonin, A. M \'e ot, and A. Bugaiska. 2018. https://doi.org/10.3758/s13428-018-1014-y Concreteness norms for 1,659 french words: Relationships with other psycholinguistic variables and word recognition times . Behavior Research Methods, 50(6):2366--2387

  46. [71]

    Language Models are Few-Shot Learners

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, and 1 others. 2020. https://arxiv.org/abs/2005.14165 Language models are few-shot learners . Preprint, arXiv:2005.14165

  47. [72]

    Marc Brysbaert, Amy Beth Warriner, and Victor Kuperman. 2014. https://doi.org/10.3758/s13428-013-0403-5 Concreteness ratings for 40 thousand generally known english word lemmas . Behavior Research Methods, 46(3):904--911

  48. [73]

    Cristina Cacciari and Patrizia Tabossi, editors. 1993. Idioms: Processing, Structure, and Interpretation. Lawrence Erlbaum Associates, Hillsdale, NJ, USA

  49. [74]

    Tuhin Chakrabarty, Yejin Choi, and Vered Shwartz. 2022. https://doi.org/10.1162/tacl_a_00478 It ' s not rocket science: Interpreting figurative language in narratives . Transactions of the Association for Computational Linguistics, 10:589--606

  50. [75]

    Tuhin Chakrabarty, Xurui Zhang, Smaranda Muresan, and Nanyun Peng. 2021. https://doi.org/10.18653/v1/2021.naacl-main.336 MERMAID : Metaphor generation with symbolism and discriminative decoding . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021)

  51. [76]

    Jean Charbonnier and Christian Wartena. 2019. https://doi.org/10.18653/v1/W19-0415 Predicting word concreteness and imagery . In Proceedings of the 13th International Conference on Computational Semantics - Long Papers, pages 176--187, Gothenburg, Sweden. Association for Computational Linguistics

  52. [77]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics (NAACL 2019)

  53. [78]

    Diego Frassinelli, Daniela Naumann, Jason Utt, and Sabine Schulte m Walde. 2017. https://aclanthology.org/W17-6910/ Contextual characteristics of concrete and abstract words . In Proceedings of the 12th International Conference on Computational Semantics ( IWCS ) --- Short papers

  54. [79]

    Gemma Team , Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, and 179 others. 2024. https://arxiv.org/abs/2408.00118 Gemma 2:...

  55. [80]

    Saptarshi Ghosh and Tianyu Jiang. 2025. https://doi.org/10.18653/v1/2025.naacl-long.330 C on M e C : A dataset for metonymy resolution with common nouns . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025)

  56. [81]

    Saptarshi Ghosh and Tianyu Jiang. 2026. https://arxiv.org/abs/2604.12919 Metfuse: Figurative fusion between metonymy and metaphor . Preprint, arXiv:2604.12919

  57. [82]

    Saptarshi Ghosh, Linfeng Liu, and Tianyu Jiang. 2026. https://doi.org/10.18653/v1/2026.eacl-long.92 A computational approach to visual metonymy . In Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (EACL 2026)

  58. [83]

    Sam Glucksberg. 2001. https://academic.oup.com/book/32733 Understanding Figurative Language: From Metaphor to Idioms . Oxford University Press, New York, NY, USA

  59. [84]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, and others. 2024. https://arxiv.org/abs/2407.21783 The llama 3 herd of models . Preprint, arXiv:2407.21783

  60. [85]

    Guasch, P

    M. Guasch, P. Ferr \'e , and I. Fraga. 2016. https://doi.org/10.3758/s13428-015-0684-y Spanish norms for affective and lexico-semantic variables for 1,400 words . Behavior Research Methods, 48(4):1358--1369

  61. [86]

    Hessel Haagsma, Johan Bos, and Malvina Nissim. 2020. https://aclanthology.org/2020.lrec-1.35/ MAGPIE : A large corpus of potentially idiomatic expressions . In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 279--287, Marseille, France. European Language Resources Association

  62. [87]

    Jack Hessel, David Mimno, and Lillian Lee. 2018. https://doi.org/10.18653/v1/N18-1199 Quantifying the visual concreteness of words and topics in multimodal datasets . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics (NAACL 2018)

  63. [88]

    Holyoak and Du s an Stamenkovi \'c

    Keith J. Holyoak and Du s an Stamenkovi \'c . 2018. https://doi.org/10.1037/bul0000145 Metaphor comprehension: A critical review of theories and evidence . Psychological Bulletin, 144(6):641--671

  64. [89]

    Cosimo Iaia, Bhavin Choksi, Emily Wiebers, Gemma Roig, and Christian J. Fiebach. 2025. https://arxiv.org/abs/2505.15682 The representational alignment between humans and language models is implicitly driven by a concreteness effect . Preprint, arXiv:2505.15682

  65. [90]

    Jessen, R

    F. Jessen, R. Heun, M. Erb, D.-O. Granath, U. Klose, A. Papassotiropoulos, and W. Grodd. 2000. https://doi.org/10.1006/brln.2000.2340 The concreteness effect: Evidence for dual coding and context availability . Brain and Language, 74(1):103--112

  66. [91]

    Skipper, and Gabriella Vigliocco

    Viktor Kewenig, Jeremy I. Skipper, and Gabriella Vigliocco. 2025. https://api.semanticscholar.org/CorpusID:279984017 A multimodal transformer-based tool for automatic generation of concreteness ratings across languages . Communications Psychology, 3

  67. [92]

    Huiyuan Lai and Malvina Nissim. 2024. https://doi.org/10.1145/3654795 A survey on automatic generation of figurative language: From rule-based systems to large language models . ACM Comput. Surv., 56(10)

  68. [93]

    Lai, Odessa Howerton, and Rutvik H

    Vicky T. Lai, Odessa Howerton, and Rutvik H. Desai. 2019. https://doi.org/10.1016/j.brainres.2019.03.005 Concrete processing of action metaphors: Evidence from erp . Brain Research, 1714:202--209

  69. [94]

    George Lakoff. 1993. https://doi.org/10.1017/CBO9781139173865.013 The contemporary theory of metaphor . In Andrew Ortony, editor, Metaphor and Thought, 2 edition, pages 202--251. Cambridge University Press, Cambridge, UK

  70. [95]

    George Lakoff and Mark Johnson. 1980. Metaphors We Live By. University of Chicago Press, Chicago

  71. [96]

    Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, and Jie Fu. 2025. https://doi.org/10.18653/v1/2025.findings-naacl.251 A closer look into mixture-of-experts in large language models . In Findings of the Association for Computational Linguistics: NAACL 2025 (Findings of NAACL 2025)

  72. [97]

    Samuel Marks and Max Tegmark. 2024. https://arxiv.org/abs/2310.06824 The geometry of truth: Emergent linear structure in large language model representations of true/false datasets . Preprint, arXiv:2310.06824

  73. [98]

    Rowan Hall Maudslay, Tiago Pimentel, Ryan Cotterell, and Simone Teufel. 2020. https://doi.org/10.18653/v1/2020.figlang-1.30 Metaphor detection using context and concreteness . In Proceedings of the Second Workshop on Figurative Language Processing, pages 221--226, Online. Association for Computational Linguistics

  74. [99]

    Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, and Thomas Scialom. 2023. https://arxiv.org/abs/2302.07842 Augmented language models: a survey . Preprint, arXiv:2302.07842

  75. [100]

    Mon, Mira Nencheva, Francesca M.M

    Serena K. Mon, Mira Nencheva, Francesca M.M. Citron, Casey Lew-Williams, and Adele E. Goldberg. 2021. https://doi.org/10.1016/j.jml.2021.104285 Conventional metaphors elicit greater real-time engagement than literal paraphrases or concrete sentences . Journal of Memory and Language, 121:104285

  76. [101]

    Montefinese, L

    M. Montefinese, L. Gregori, A. A. Ravelli, R. Varvara, and D. P. Radicioni. 2023. https://doi.org/10.1371/journal.pone.0293031 Concretext norms: Concreteness ratings for italian and english words in context . PLOS ONE, 18(10):e0293031

  77. [102]

    Montefinese, A

    M. Montefinese, A. Visalli, A. Angrilli, and E. Ambrosini. 2025. https://doi.org/10.1111/psyp.70074 Fine-grained concreteness effects on word processing and representation across three tasks: An erp study . Psychophysiology, 62(5):e70074

  78. [103]

    Emiko J Muraki, Summer Abdalla, Marc Brysbaert, and Penny M Pexman. 2023. https://doi.org/10.3758/s13428-022-01912-6 Concreteness ratings for 62,000 english multiword expressions . Behavior research methods, 55(5):2522--2531

  79. [104]

    gpt-oss-120b & gpt-oss-20b Model Card

    OpenAI, :, Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, and others. 2025. https://arxiv.org/abs/2508.10925 gpt-oss-120b gpt-oss-20b model card . Preprint, arXiv:2508.10925

  80. [105]

    Klaus-Uwe Panther and G \"u nter Radden, editors. 1999. Metonymy in Language and Thought, volume 4 of Human Cognitive Processing. John Benjamins Publishing Company, Amsterdam / Philadelphia

Showing first 80 references.