pith. sign in

arxiv: 2604.06674 · v1 · submitted 2026-04-08 · 💻 cs.CL · cs.AI

Between Century and Poet: Graph-Based Lexical Semantic Change in Persian Poetry

Pith reviewed 2026-05-10 17:50 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords lexical semantic changePersian poetrygraph analysisWord2Vec embeddingsneighborhood rewiringdigital humanitieshistorical semanticspoet-specific variation
0
0 comments X

The pith

Semantic change in Persian poetry appears as the rewiring of local word neighborhoods in graphs rather than abstract vector drift.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how words in Persian poetry shift meaning across centuries and poets by building aligned Word2Vec embeddings and then analyzing the graphs of their neighboring terms. It tracks specific processes such as gaining or losing neighbors, changing bridge roles between communities, and moving between semantic clusters for twenty target words anchored around Earth, Night, two wine terms, and Heart. The approach treats change as relational rewiring within local semantic structures instead of isolated displacement in embedding space. This matters for readers interested in literary history because it aligns computational results more closely with how poems actually use words in changing constellations of affect, court, and mysticism.

Core claim

Treating lexical history as the rewiring of local semantic graphs—measured through neighbor gain and loss, bridge roles, and community movement—reveals distinct patterns: Night is more time-sensitive, Earth more poet-sensitive, Heart shows continuity despite graph-role mobility, and the two wine terms differ in semantic breadth and stability. A lexical audit of the corpus supports that these patterns reflect historically driven terms, poet-specific usages, and sparse mystical vocabulary. Overall, this graph-based view captures semantic change in Persian poetry more effectively than vector displacement alone.

What carries the argument

Graph-based neighborhood analysis on aligned Word2Vec spaces, tracking neighbor gain/loss, bridge roles, and community movement across centuries and poets.

If this is right

  • Night exhibits stronger shifts tied to historical periods than to individual poetic voices.
  • Earth displays more variation linked to specific poets' stylistic choices.
  • Heart maintains overall continuity while its position in the graph changes.
  • Broader wine-related terms prove more diffuse and variable than narrower ones.
  • The method highlights persistence, migration, mediation, and selective transformation in literary language.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same rewiring lens could be tested on non-Persian poetic traditions to see whether neighborhood stability varies by language or genre.
  • Corpus builders might prioritize denser sampling of certain centuries or poets to reduce sparsity effects on bridge-role measurements.
  • Literary scholars could use these graphs to flag candidate passages where a word's relational context has shifted, then verify them by close reading.

Load-bearing premise

The aligned embeddings and derived graph metrics reflect genuine historical semantic shifts rather than artifacts of corpus sparsity, alignment errors, or poet-specific style.

What would settle it

A side-by-side manual reading of word contexts from different centuries and poets that shows no corresponding pattern of neighbor gain, loss, or role change matching the graph metrics.

Figures

Figures reproduced from arXiv: 2604.06674 by Kourosh Shahnazari, Mohammadali Keshtparvar, Seyed Moein Ayyoubzadeh.

Figure 1
Figure 1. Figure 1: Mean embedding drift against mean neighbor turnover for all twenty target words. Colors distinguish the recurrent reference words, the broader symbolic field, and the explicitly mystical layer; open diamonds mark words whose poet-side coverage remains thin. The most volatile edge of the plane is occupied not only by mey / Wine but also by darvish / Dervish, haqiqat / Truth, tariqat / Path, and soofi / Sufi… view at source ↗
Figure 2
Figure 2. Figure 2: Normalized component profile for the full word set, ordered by century-side signal. The columns report drift, turnover, graph-role volatility, century signal, and poet signal; a dagger marks thin poet-side coverage. Heart. These words are not motionless. Rather, their reattachments remain more constrained and therefore more narratable within a stable literary horizon [PITH_FULL_IMAGE:figures/full_fig_p007… view at source ↗
Figure 3
Figure 3. Figure 3: Century-wise graph-role trajectories for all twenty target words. The upper heatmap retains absolute century values, while the lower heatmap centers each century on its mean so positive and negative values mark relative prominence within that period. Century 3 is omitted because its exceptionally thin graph does not sustain comparison with the denser periods that follow. 9 [PITH_FULL_IMAGE:figures/full_fi… view at source ↗
Figure 3
Figure 3. Figure 3: Century-wise graph-role trajectories for the full word set, continued. 10 [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Transition-level dynamics for the full target set. The upper panel reports neighbor turnover between adjacent centuries; the lower panel reports the extent of community reallocation, measured as one minus the overlap between adjacent communities. and inward speech. The words therefore remain legible while their internal weights change, which is precisely the kind of continuity that raw drift alone would fl… view at source ↗
Figure 5
Figure 5. Figure 5: Local century-to-century drift, deviation from the full-corpus reference model, and their agreement profile for all twenty target words. Adjacent drift is assigned to its resulting century. The lower panel classifies each word-period pairing by the joint behavior of the two measures, distinguishing stable usage, local fluctuation, robust change, and settled departure. 12 [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗
Figure 6
Figure 6. Figure 6: Overall similarity structure across the poet set, shown as a centered version of the raw mean-cosine matrix. Positive values indicate poet pairs whose affinity exceeds each poet’s general similarity to the wider set; negative values indicate comparatively weaker affinity. The wine pair brings the interpretive and methodological stakes into one frame [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Direct comparison between mey / Wine and baadeh / Wine. Both remain historically active, but baadeh / Wine keeps a narrower convivial neighborhood while mey / Wine disperses across a broader symbolic field. 4.4 Time and Poet The century-versus-poet comparison consolidates the argument at the scale of the full panel [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Direct comparison of century-side and poet-side signal for all twenty target words. The bar panel compares the composite signal scores, and the plane situates the lexical panel across time-sensitive, mixed, and poet-sensitive regions. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
read the original abstract

Meaning in Persian poetry is both historical and relational. Words persist through literary tradition while shifting their force through changing constellations of neighbors, rhetorical frames, and poetic voices. This study examines that process using aligned Word2Vec spaces combined with graph-based neighborhood analysis across centuries and major poets. Rather than modeling semantic change as vector displacement alone, it treats lexical history as the rewiring of local semantic graphs: the gain and loss of neighbors, shifts in bridge roles, and movement across communities. The analysis centers on twenty target words, anchored by five recurrent reference terms: Earth, Night, two wine terms, and Heart. Surrounding them are affective, courtly, elemental, and Sufi concepts such as Love, Sorrow, Dervish, King, Annihilation, and Truth. These words exhibit distinct patterns of change. Night is more time-sensitive, Earth more poet-sensitive, and Heart shows continuity despite graph-role mobility. The two wine terms highlight probe sensitivity: one is broad and semantically diffuse, while the other is narrower and more stable. A lexical audit confirms that the corpus contains historically driven terms, poet-specific usages, and sparsely attested mystical vocabulary requiring caution. Overall, semantic change in Persian poetry is better captured as neighborhood rewiring than as abstract drift. For Digital Humanities, this approach restores local structure to computational analysis and supports interpretations closer to literary practice: persistence, migration, mediation, and selective transformation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that semantic change in Persian poetry is better modeled as rewiring of local semantic graphs (neighbor gain/loss, bridge roles, community movement) from aligned Word2Vec embeddings than as vector displacement. It analyzes twenty target words anchored by reference terms (Earth, Night, two wine terms, Heart) and surrounding concepts (Love, Sorrow, etc.), identifying patterns such as time-sensitivity for Night, poet-sensitivity for Earth, and continuity with mobility for Heart. A lexical audit notes historical, poet-specific, and sparse terms, leading to the conclusion that the graph-based approach better supports literary interpretations of persistence and transformation in digital humanities.

Significance. If the central claim holds after validation, this would contribute to computational literary studies by demonstrating a relational, graph-based method for diachronic semantics in a sparse, stylistically heterogeneous non-English corpus. It could encourage DH work that prioritizes local neighborhood structure over global vector metrics, offering interpretations closer to traditional literary analysis. The qualitative patterns on specific Persian poetic terms provide concrete examples that might seed further case studies, though the current absence of quantitative support reduces immediate significance.

major comments (3)
  1. [Abstract] Abstract (final sentence): The claim that 'semantic change in Persian poetry is better captured as neighborhood rewiring than as abstract drift' is not supported by any quantitative comparison, baseline (e.g., cosine drift on the same aligned embeddings), statistical test, or error analysis, which is load-bearing for the overall interpretive conclusion.
  2. [Pipeline description] Pipeline description: No details are given on Word2Vec hyperparameters, alignment procedure between century- or poet-specific spaces, graph construction (neighbor definition, edge thresholds), or stability checks, preventing assessment of whether reported rewiring reflects genuine shifts or artifacts from corpus sparsity and alignment distortions.
  3. [Analysis of the twenty target words] Analysis of the twenty target words: Distinct patterns (e.g., Night time-sensitive, Earth poet-sensitive, wine terms differing in diffuseness) are presented qualitatively without error analysis, baseline comparisons showing graph metrics outperform vector displacement, or quantitative validation of the metrics, leaving the central claim on unshown evidence.
minor comments (2)
  1. [Abstract] The abstract references 'five recurrent reference terms' and 'twenty target words' but does not list them explicitly or indicate the number of centuries/poets covered, which would improve reader orientation.
  2. Consider adding a summary table of observed graph changes per target word to clarify the qualitative findings and facilitate comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive comments, which highlight important areas for strengthening the manuscript's methodological transparency and evidential basis. We address each major comment point by point below, indicating where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract (final sentence): The claim that 'semantic change in Persian poetry is better captured as neighborhood rewiring than as abstract drift' is not supported by any quantitative comparison, baseline (e.g., cosine drift on the same aligned embeddings), statistical test, or error analysis, which is load-bearing for the overall interpretive conclusion.

    Authors: We agree that the abstract's concluding sentence advances a comparative claim without accompanying quantitative support in the current version. The manuscript's contribution is primarily interpretive, illustrating how graph-based neighborhood analysis can yield readings more congruent with literary scholarship on relational meaning in Persian poetry. To address this, we will revise the abstract to frame the claim as an interpretive outcome supported by the case studies rather than a statistically demonstrated superiority, and we will add a brief comparative subsection in the results that reports cosine similarity drifts alongside the graph metrics for the twenty target words. revision: partial

  2. Referee: [Pipeline description] Pipeline description: No details are given on Word2Vec hyperparameters, alignment procedure between century- or poet-specific spaces, graph construction (neighbor definition, edge thresholds), or stability checks, preventing assessment of whether reported rewiring reflects genuine shifts or artifacts from corpus sparsity and alignment distortions.

    Authors: This observation is correct and points to a clear omission in the submitted version. The original text prioritized the literary analysis over technical specification. In the revised manuscript we will insert a dedicated Methods section that specifies the Word2Vec hyperparameters (vector dimension, window size, number of epochs, and training regime), the alignment procedure used between the century- and poet-specific spaces, the precise definition of graph neighbors and any edge-weight thresholds, and the stability checks performed across multiple training runs. revision: yes

  3. Referee: [Analysis of the twenty target words] Analysis of the twenty target words: Distinct patterns (e.g., Night time-sensitive, Earth poet-sensitive, wine terms differing in diffuseness) are presented qualitatively without error analysis, baseline comparisons showing graph metrics outperform vector displacement, or quantitative validation of the metrics, leaving the central claim on unshown evidence.

    Authors: The patterns are derived from systematic inspection of the aligned graphs and are cross-validated against the lexical audit and existing literary scholarship. We present them as illustrative case studies rather than as statistically validated superiority of one metric class over another. We will augment the analysis section with a table that juxtaposes selected graph metrics (neighbor turnover, betweenness change) against cosine drift values for the same words, together with notes on sparse or historically variable terms that limit quantitative precision. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical description of graph metrics on aligned embeddings

full rationale

The paper conducts an empirical analysis of lexical change in Persian poetry by training and aligning Word2Vec embeddings, constructing graphs from nearest-neighbor relations, and qualitatively describing patterns such as neighbor gain/loss and community movement for twenty target words. No equations, derivations, or fitted parameters are presented as predictions; the central claim that neighborhood rewiring captures change better than vector drift is an interpretive conclusion drawn from the observed graphs rather than a quantity forced by the method's own definitions or self-citations. The work contains no self-definitional loops, no renaming of known results as novel unifications, and no load-bearing reliance on prior author work that would reduce the findings to inputs by construction. The analysis is self-contained as data-driven observation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard assumptions of embedding models and graph metrics without introducing new free parameters, axioms, or invented entities beyond domain choices of reference words and corpus selection.

axioms (2)
  • domain assumption Aligned Word2Vec spaces preserve comparable semantic neighborhoods across historical periods
    Invoked when treating neighbor changes as semantic rewiring rather than alignment artifacts.
  • domain assumption Graph metrics (neighbor gain/loss, bridge roles, community membership) capture meaningful lexical semantic change
    Central to the claim that rewiring is superior to abstract drift.

pith-pipeline@v0.9.0 · 5566 in / 1410 out tokens · 24465 ms · 2026-05-10T17:50:46.665784+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

  1. [1]

    PERSIAN Literature (2) Classical

    IRAN viii. PERSIAN Literature (2) Classical. C.-H. de Fouchécour. Encyclopaedia Iranica, Vol. XIII, Fasc. 4, pp. 414–432, 2006

  2. [2]

    A Two-Colored Brocade: The Imagery of Persian Poetry. A. Schimmel. The University of North Carolina Press, 2004

  3. [3]

    Medieval Persian Court Poetry. J. S. Meisami. Princeton University Press, 1987. 21

  4. [4]

    Rumi: Past and Present, East and West: The Life, Teaching, and Poetry of Jalal al-Din Rumi. F. D. Lewis. Oneworld, 2000

  5. [5]

    Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. W. L. Hamilton, J. Leskovec, and D. Jurafsky.Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, pp. 1489–1501, 2016.doi:10.18653/v1/P16-1141

  6. [6]

    Diachronic Word Embeddings and Semantic Shifts: A Survey. A. Kutuzov, L. Øvrelid, T. Szymanski, and E. Velldal.Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 1384–1397, 2018

  7. [7]

    1136–1145, 2017.doi:10.18653/v1/D17-1118

    Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models.H.Dubossarsky,S.Hengchen,N.Tahmasebi,andD.Schlechtweg.Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 1136–1145, 2017.doi:10.18653/v1/D17-1118

  8. [8]

    SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. D. Schlechtweg, B. McGillivray, S. Hengchen, H. Dubossarsky, and N. Tahmasebi.Proceedings of the Fourteenth Workshop on Semantic Evaluation, Association for Computational Linguistics, pp. 1–23, 2020.doi:10.18653/v1/2020.semeval-1.1

  9. [9]

    Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings. P. Shoemark, F. F. Liza, D. Nguyen, S. A. Hale, and B. McGillivray.Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association fo...

  10. [10]

    Dynamic Word Embeddings. R. Bamler and S. Mandt.Proceedings of the 34th Interna- tional Conference on Machine Learning, PMLR, pp. 380–389, 2017

  11. [11]

    A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings. M. Artetxe, G. Labaka, and E. Agirre.Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 789–798, 2018

  12. [12]

    Graph-Based Clustering for Detecting Semantic Change Across Time and Languages. X. Ma, M. Strube, and W. Zhao.Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, pp. 1542–1561, 2024.doi:10.18653/v1/ 2024.eacl-long.93

  13. [13]

    Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift. M. Mart- inc, P. Kralj Novak, and S. Pollak.Proceedings of the 12th Language Resources and Evaluation Conference, European Language Resources Association, pp. 4811–4819, 2020. 22

  14. [14]

    A Corpus of Persian Literary Text. S. Raji, M. Alikhani, G. de Melo, and M. Stone. Language Resources and Evaluation, 58, pp. 409–425, 2024.doi:10.1007/s10579-023- 09689-6

  15. [15]

    Bilingual Chronological Classification of Hafez’s Poems. A. D. Rahgozar and D. Inkpen. Proceedings of the 5th Workshop on Computational Linguistics for Literature, Associa- tion for Computational Linguistics, pp. 54–62, 2016.doi:10.18653/v1/W16-0207

  16. [16]

    Semantics and Homothetic Clustering of Hafez Poetry. A. D. Rahgozar and D. Inkpen. Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Association for Compu- tational Linguistics, pp. 82–90, 2019.doi:10.18653/v1/W19-2511

  17. [17]

    PARSI: Persian Authorship Recognition via Stylometric Integration. K. Shahnazari, M. Keshtparvar, and S. M. Ayyoubzadeh. arXiv preprint, 2025. arXiv:2506.21840

  18. [18]

    NAZM: Network Analysis of Zonal Metrics in Persian Poetic Tradition. K. Shahnazari, S. M. Ayyoubzadeh, M. Fazli, and M. Keshtparvar.Social Network Analysis and Mining, 15, 115, 2025.doi:10.1007/s13278-025-01537-5

  19. [19]

    A Dynamic Atlas of Persian Poetic Symbolism: Families, Fields, and the Historical Rewiring of Meaning. K. Shahnazari, S. M. Ayyoubzadeh, and M. Keshtparvar. arXiv preprint, 2026. arXiv:2604.01467

  20. [20]

    Philosophy

    RUMI i. Philosophy. W. C. Chittick. Encyclopaedia Iranica, 2017

  21. [21]

    Persian Wine Tradition and Symbolism: Evidence from the Medieval Poetry of Hafiz. A. Saeidi and T. Unwin.Journal of Wine Research, 16, 1, pp. 21–38, 2005.doi: 10.1080/09571260500053541

  22. [22]

    Mystical Dimensions of Islam. A. Schimmel. The University of North Carolina Press, 1975. 23