pith. sign in

arxiv: 2606.26360 · v1 · pith:EJZK6OOKnew · submitted 2026-06-24 · 💻 cs.CL

Phonetic and semantic analyses of spoken corpora of Beijing and Taiwan Mandarin indicate that the neutral tone is a lexical tone

Pith reviewed 2026-06-26 01:24 UTC · model grok-4.3

classification 💻 cs.CL
keywords neutral tonelexical toneMandarin Chinesecorpus analysispitch contourscontextualized embeddingsBeijing MandarinTaiwan Mandarin
0
0 comments X

The pith

The neutral tone in Mandarin is a lexical tone with its own target and word-specific signatures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Corpus analyses of spontaneous speech in Beijing and Taiwan Mandarin show that the neutral tone has an independent tonal target like the four standard lexical tones. In two-syllable words the neutral tone's pitch contour depends on the tone of the first syllable, matching the pattern documented for lexical tones on the second syllable. Words carrying the neutral tone also display word-specific pitch signatures that correlate with their contextualized embeddings, a property previously reported only for lexical tones. These phonetic and distributional parallels lead the authors to classify the neutral tone as lexical rather than reduced or toneless in both varieties.

Core claim

The neutral tone possesses its own tonal target, exhibits first-syllable-dependent contours in disyllabic words, and carries word-specific pitch signatures predictable from contextualized embeddings, properties that together establish it as a lexical tone in Beijing and Taiwan Mandarin.

What carries the argument

Independent tonal target plus first-syllable dependence and embedding-correlated word-specific pitch signatures

If this is right

  • Neutral-tone words should be modeled with the same tonal inventory and sandhi rules as other lexical-tone words.
  • Variability observed in neutral-tone realizations is expected to match the range seen for the four standard tones.
  • Some differences in neutral-tone realization between Beijing and Taiwan corpora may be traceable to differences in word meanings across the two datasets.
  • Single-syllable neutral-tone words fit the same lexical pattern as disyllabic ones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Tone-acquisition studies may need to treat the neutral tone as one of five lexical tones rather than a separate reduced category.
  • Speech technology systems could improve by generating neutral-tone contours from the same embedding-based mechanisms used for lexical tones.
  • Re-examination of other putatively toneless or reduced syllables in Mandarin and related languages becomes testable with the same embedding-correlation method.

Load-bearing premise

Possession of an independent tonal target, first-syllable-dependent contours, and word-specific pitch signatures correlated with embeddings is sufficient to classify a tone as lexical rather than reduced or toneless.

What would settle it

A new corpus or controlled experiment showing that neutral-tone syllables lack an independent tonal target or that their pitch signatures fail to correlate with contextualized embeddings would falsify the lexical classification.

Figures

Figures reproduced from arXiv: 2606.26360 by R. Harald Baayen, Yuxin Lu, Zhexuan Li.

Figure 1
Figure 1. Figure 1: Toy dataset. The left panel shows the F0 contours of five selected tokens with T4- T5 tone pattern, produced by the same speaker, from the corpus of spoken Beijing Mandarin. The right panel shows the fitted F0 contours predicted by a simple GAM, using a thin plate regression spline smooth for normalized time as predictor. the utterance, bigram probability), in order to optimize estimates of the pitch compo… view at source ↗
Figure 2
Figure 2. Figure 2: The average predicted pitch contours for the T1-T5, T2-T5, T3-T5, and T4-T5 tone patterns. The contours are reproduced from the black curves in [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Reduction in AIC when a predictor of interest is added to the baseline GAM. Larger reductions indicate greater variable importance. Results are shown separately for Beijing Mandarin (left panel) and Taiwan Mandarin (right panel). logpitch ∼ tone_pattern + s(normalized_t, by = tone_pattern, k = 5) + s(normalized_t, speaker, bs = 'fs', m = 1) + s(normalized_t, word, bs = 'fs', m = 1) + s(normalized_t, tonal_… view at source ↗
Figure 4
Figure 4. Figure 4: Estimated concurvity scores for smooth terms in the best-fit GAMs for Beijing Mandarin (left panel) and Taiwan Mandarin (right panel). Lower concurvity values indicate that a predictor is contributing more independently to the model fit. The concurvity scores for the word smooth are highlighted in red. Note. For models with continuous predictors, only the concurvities of smooth terms involving normalized t… view at source ↗
Figure 5
Figure 5. Figure 5: The modulation of tonal context on predicted pitch contours, showing all 36 tonal context levels in Beijing Mandarin (left panel) and Taiwan Mandarin (right panel). The predicted pitch contours are estimated from the partial effect of the factor smooth for tonal context. In both panels, contexts in which the neutral-tone syllable is followed by another neutral tone are highlighted by color. They represent … view at source ↗
Figure 6
Figure 6. Figure 6: Predicted pitch contours for tone patterns in Beijing (left panel) and Taiwan (right panel). The pitch contours shown represent the partial effect of tone pattern, in combination with the corresponding tone-pattern specific intercepts. Separate GAMs were fitted to the two dialects. The vertical dashed line indicates the average syllable boundary on the normalized time scale, with the first half correspondi… view at source ↗
Figure 7
Figure 7. Figure 7: Difference curves for pairs of tone patterns in Beijing Mandarin (left panel) and Taiwan Mandarin (right panel). The red area indicates that the normalized time domain where the estimated difference curve is significant different from zero, whereas grey area marks the normalized time domain with no significant difference. 14 [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Predicted pitch contours for individual words, estimated by combining the partial effect of the factor smooth for word, the partial effect of tone pattern, and the model inter￾cepts. Predictions are based on separate GAMs fitted to the Beijing and Taiwan Mandarin datasets and are overlaid to facilitate comparison. Words that bear neutral tone in both varieties are shown in the upper panel. Words that are r… view at source ↗
Figure 9
Figure 9. Figure 9: Distribution of mean squared difference between word type pairs. The orange box indicates words bearing T5 in both Beijing and Taiwan Mandarin. The green box indicates words in which the second syllable is bearing T5 in Beijing Mandarin but not in Taiwan Mandarin. tonal signatures not only in Taiwan Mandarin but also in Beijing Mandarin, the evidence unequivocally supports tonal signatures across both dial… view at source ↗
Figure 10
Figure 10. Figure 10: Contextualized embeddings 𝑆beijing and 𝑆taiwan_SC, obtained from a pretrained Chinese Qwen-2.5 model, are shown in a two-dimensional plane obtained with t-SNE. Convex hulls (polygons) highlight the clusters of the top-10 frequent word types, labeled by the numbers 1–10. Orange numbers represent tokens in 𝑆beijing, and blue numbers represent tokens in 𝑆taiwan_SC. Pitch contours were represented as fixed-le… view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of observed (dashed lines) and predicted pitch contours (solid lines) for the ten words shown in [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: CE-predicted pitch contours for tone patterns (solid lines) and GAM-estimated partial effects for tone pattern (dashed lines) for Beijing Mandarin (upper panel) and Taiwan Mandarin (lower panel). For ease of comparison, the GAM-predicted contours are linearly rescaled to match the range of the CE-predicted contours. 5 General discussion The neutral, or floating, tone of Mandarin Chinese is an enigmatic to… view at source ↗
read the original abstract

The neutral, or floating, tone of Mandarin Chinese is a tone with an enigmatic set of properties. It has been described as a reduced tone, or as a tone that sometimes is lexically fixed but that can also be toneless. In two-syllable words, it is found only on the second syllable, but single-syllable words can also have the neutral tone. We present a corpus-based study of the phonetic realization of the neutral tone in spontaneous conversational speech corpora of Beijing Mandarin and Taiwan Mandarin. We show that the neutral tone has its own tonal target, just as the four lexical tones of Mandarin. We also show that disyllabic words with a neutral tone have pitch contours that have a pitch component that depends on the tone on the first syllable, just as has been observed for two-syllable words with a lexical tone on the second syllable (Chuang et al., 2026). Furthermore, words with a floating tone have word-specific pitch signatures, which have also been documented for single-syllable words (Jin et al., 2026) as well as two-syllable words (Lu et al., 2026b). These word-specific pitch signatures are shown to be predictable to some extent from words' contextualized embeddings, as previously reported for lexical tones (Chuang et al., 2026; Lu et al., 2026b). As there is also considerable variability in the realization of lexical tones, we propose that the neutral tone is, in fact, a lexical tone in both Taiwan Mandarin and Beijing Mandarin. We document both similarities and differences in the realization of the floating tone in these two varieties and provide evidence, using contextualized embeddings, that some of the observed differences may arise from differences in the meanings of the words as used in the two corpora.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims, based on phonetic analyses of spontaneous speech corpora from Beijing and Taiwan Mandarin, that the neutral (floating) tone has an independent tonal target, exhibits first-syllable-dependent pitch contours in disyllables, and shows word-specific pitch signatures predictable from contextualized embeddings—properties also documented for lexical tones. Citing variability in lexical-tone realizations, the authors conclude that the neutral tone is in fact a lexical tone in both varieties, while documenting cross-variety differences that may stem from semantic usage differences.

Significance. If the central claim holds, the result would reclassify the neutral tone within Mandarin phonology, treating it as lexically specified rather than reduced or toneless and thereby affecting models of tone inventory and realization. The corpus approach across two varieties, combined with embedding-based predictability, supplies a potential diagnostic for lexical status and highlights semantic contributions to phonetic variation.

major comments (2)
  1. [final proposal paragraph] The central inference—that possession of an independent target, first-syllable dependence, and embedding-predictable word signatures suffices to reclassify neutral tone as lexical—rests on an untested sufficiency assumption. No explicit criterion is supplied that would rule out a reduced or toneless category displaying the same properties (see the proposal paragraph following the embedding results).
  2. [corpus analysis and embedding sections] Comparisons establishing that neutral-tone contours match lexical-tone patterns in first-syllable dependence and embedding predictability are referenced to Chuang et al. (2026) and Lu et al. (2026b) without reporting here the sample sizes, exclusion criteria, or statistical controls used in those baselines or in the current neutral-tone data, leaving the claimed equivalence difficult to evaluate.
minor comments (2)
  1. [abstract and methods] The abstract and methods description omit quantitative details on token counts per variety, speaker numbers, and pitch-extraction parameters.
  2. [discussion of Beijing–Taiwan differences] Clarify how the reported cross-variety semantic differences were quantified via embeddings and whether they survive controls for lexical frequency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the strength of our central claim. We respond to each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [final proposal paragraph] The central inference—that possession of an independent target, first-syllable dependence, and embedding-predictable word signatures suffices to reclassify neutral tone as lexical—rests on an untested sufficiency assumption. No explicit criterion is supplied that would rule out a reduced or toneless category displaying the same properties (see the proposal paragraph following the embedding results).

    Authors: We agree that the reclassification argument would be strengthened by an explicit criterion for lexical-tone status that rules out a reduced or toneless analysis. The manuscript currently infers lexical status from the combination of an independent target, first-syllable dependence, embedding-predictable word signatures, and the documented variability of lexical tones. In revision we will expand the proposal paragraph to state a set of diagnostic properties drawn from the phonological literature on tone, explain why these properties are taken to favor a lexical rather than reduced analysis, and note the remaining inferential gap. This change directly addresses the sufficiency assumption. revision: yes

  2. Referee: [corpus analysis and embedding sections] Comparisons establishing that neutral-tone contours match lexical-tone patterns in first-syllable dependence and embedding predictability are referenced to Chuang et al. (2026) and Lu et al. (2026b) without reporting here the sample sizes, exclusion criteria, or statistical controls used in those baselines or in the current neutral-tone data, leaving the claimed equivalence difficult to evaluate.

    Authors: We agree that the absence of these methodological details makes the claimed equivalence hard to assess. In the revised manuscript we will add, in the relevant sections, the sample sizes, exclusion criteria, and statistical controls reported in Chuang et al. (2026) and Lu et al. (2026b), together with the corresponding details for the neutral-tone data analyzed here. This will allow readers to evaluate the comparisons directly. revision: yes

Circularity Check

3 steps flagged

Neutral tone reclassified as lexical by matching properties whose diagnostic status for lexical tones is established only via self-citations to overlapping authors' prior work

specific steps
  1. self citation load bearing [Abstract]
    "disyllabic words with a neutral tone have pitch contours that have a pitch component that depends on the tone on the first syllable, just as has been observed for two-syllable words with a lexical tone on the second syllable (Chuang et al., 2026)"

    The property invoked to align neutral tone with lexical tone is taken directly from the authors' own prior paper; the similarity claim therefore reduces to that self-citation rather than an independent observation within the present study.

  2. self citation load bearing [Abstract]
    "These word-specific pitch signatures are shown to be predictable to some extent from words' contextualized embeddings, as previously reported for lexical tones (Chuang et al., 2026; Lu et al., 2026b)"

    The embedding-predictability property used as evidence that neutral tone behaves like lexical tone is referenced exclusively to the authors' own earlier papers, rendering the diagnostic equivalence dependent on the self-citation chain.

  3. self citation load bearing [Abstract]
    "As there is also considerable variability in the realization of lexical tones, we propose that the neutral tone is, in fact, a lexical tone in both Taiwan Mandarin and Beijing Mandarin"

    The final classificatory step equates the documented variability of neutral tone with that of lexical tones, but the characterization of lexical-tone variability itself rests on the same self-cited prior results; the reclassification therefore collapses to the self-citation load.

full rationale

The paper's central inference proceeds by documenting three properties for neutral tone (independent target, first-syllable dependence, embedding-predictable word signatures) and asserting these suffice for lexical status because the same properties hold for lexical tones. Each of the three comparison points is supported only by direct citation to Chuang et al. 2026 and Lu et al. 2026b (plus Jin et al. 2026), works whose author overlap with the present paper is evident from the citation list and author names. The final sentence then equates the observed variability with lexical-tone variability to reach the reclassification. No independent, non-self-cited criterion is supplied that would rule out a reduced/toneless analysis while preserving the same phonetic facts. This satisfies the self-citation-load-bearing pattern at the load-bearing step of the argument.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard assumptions in phonetics and computational linguistics without introducing new free parameters or invented entities in the abstract; specific numerical fits or model hyperparameters are not mentioned.

axioms (2)
  • domain assumption Pitch contours observed in spontaneous speech reliably reflect underlying tonal targets.
    Invoked to interpret the neutral tone as having its own target comparable to lexical tones.
  • domain assumption Contextualized embeddings encode semantic information that can influence or correlate with phonetic realization of tones.
    Used to link word meanings to observed word-specific pitch signatures.

pith-pipeline@v0.9.1-grok · 5879 in / 1368 out tokens · 32975 ms · 2026-06-26T01:24:31.742685+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 1 canonical work pages

  1. [1]

    Foundation for Statistical Computing , author =

  2. [2]

    Cross-dialectal perspectives on

    Xu, Chenzi , journal=. Cross-dialectal perspectives on. 2024 , publisher=

  3. [3]

    A grammar of spoken

    Chao, Yuen Ren , year=. A grammar of spoken

  4. [4]

    Praat: doing phonetics by computer [

    Boersma, Paul and Weenink, David , year =. Praat: doing phonetics by computer [

  5. [5]

    , year =

    Fon, J. , year =. A preliminary construction of

  6. [6]

    Ruan, Feifei and Song, Qiong and Li, Ke and Hao, Yufeng , title =

  7. [7]

    , year =

    Wood, Simon N. , year =. Generalized additive models: an introduction with

  8. [8]

    Speech Communication , volume=

    Mandarin lexical tone duration: Impact of speech style, word length, syllable position and prosodic position , author=. Speech Communication , volume=. 2023 , publisher=

  9. [9]

    , author=

    Visualizing data using t-SNE. , author=. Journal of machine learning research , volume=

  10. [10]

    Manual of clinical phonetics , pages=

    Analyzing phonetic data with generalized additive mixed models , author=. Manual of clinical phonetics , pages=. 2021 , publisher=

  11. [11]

    The realization of tones in spontaneous spoken

    Lu, Yuxin and Chuang, Yu-Ying and Baayen, R Harald , journal=. The realization of tones in spontaneous spoken. 2026 , doi=

  12. [12]

    Form and meaning co-determine the realization of tone in

    Lu, Yuxin and Chuang, Yu-Ying and Baayen, R Harald , journal=. Form and meaning co-determine the realization of tone in. 2026 , doi=

  13. [13]

    A new kid on the block:

    Jin, Xiaoyun and Ernestus, Mirjam and Baayen, R Harald , journal=. A new kid on the block:. 2026 , doi=

  14. [14]

    Chuang, Yu-Ying and Bell, M. J. and Tseng, Yu-Hsiang and Baayen, R. H. , year =. Word-specific tonal realizations in. Language , doi=

  15. [15]

    Contextual tonal variations in

    Xu, Yi , journal=. Contextual tonal variations in. 1997 , publisher=

  16. [16]

    Harald , year=

    Gahl, Susanne and Baayen, R. Harald , year=. Language , volume =

  17. [17]

    Complexity , author =

    The discriminative lexicon:. Complexity , author =. 2019 , doi=

  18. [18]

    The Discriminative Lexicon: Theory, implementation in the

    Heitmeier, Maria and Chuang, Yu-Ying and Baayen, R Harald , year=. The Discriminative Lexicon: Theory, implementation in the

  19. [19]

    EasyAlign: An Automatic Phonetic Alignment Tool Under

    Goldman, Jean-Philippe , booktitle=. EasyAlign: An Automatic Phonetic Alignment Tool Under

  20. [20]

    Applied Acoustics , volume=

    Putonghua qingsheng yinjie texing fenxi , author=. Applied Acoustics , volume=

  21. [21]

    On neutral-tone syllables in

    Cao, Jianfen , journal=. On neutral-tone syllables in

  22. [22]

    A preliminary study of

    Chao, Yuen Ren , year=. A preliminary study of. The Tsai Yuan P'ei anniversary volume (supplementary Vol. I of the Bulletin of the Institute of History and Philology) , page =

  23. [23]

    Chen, Matthew Y. , year=. Tone Sandhi:. doi:10.1017/CBO9780511486364 , place=

  24. [24]

    Production of weak elements in speech -- Evidence from

    Chen, Yiya and Xu, Yi , journal=. Production of weak elements in speech -- Evidence from. 2006 , publisher=

  25. [25]

    Chirkova, Ekaterina and Chen, Yiya , journal=. Beijing

  26. [26]

    Neutral Tone Variation in

    Dong, Xiao and Liu, Fengming and Lin, Chien-Jer and Nesbitt, Monica and Shi, Shuju , booktitle=. Neutral Tone Variation in

  27. [27]

    and Lee, Pao-Ch'en , journal=

    Dreher, John J. and Lee, Pao-Ch'en , journal=. Instrumental investigation of single and paired. 1968 , publisher=

  28. [28]

    arXiv preprint arXiv:2203.05794 , year=

    BERTopic: Neural topic modeling with a class-based TF-IDF procedure , author=. arXiv preprint arXiv:2203.05794 , year=

  29. [29]

    A STUDY OF THE PHONETICS AND PHONOLOGY OF NEUTRAL TONES IN

    Hsieh, Feng-Fan and Chuang, Ching-Ting , journal=. A STUDY OF THE PHONETICS AND PHONOLOGY OF NEUTRAL TONES IN

  30. [30]

    , journal=

    Hsu, H.-C. , journal=. Revisiting tone and prominence in

  31. [31]

    , title=

    Hu, M. , title=. 1987 , address=

  32. [32]

    轻声与非轻声之间轻重的连续统关系 , doi =

    Huang, Jingwen and Li, Aijun , year =. 轻声与非轻声之间轻重的连续统关系 , doi =

  33. [33]

    Phonological identity of the neutral-tone syllables in

    Huang, Karen , journal=. Phonological identity of the neutral-tone syllables in

  34. [34]

    , journal=

    Kubler, Cornelius C. , journal=. The influence of Southern Min on the. 1985 , publisher=

  35. [35]

    A phonetic study of the neutral tone in

    Lee, Wai-Sum , booktitle=. A phonetic study of the neutral tone in

  36. [36]

    Prosodic characteristics of the neutral tone in

    Lee, Wai-Sum and Zee Eric , journal=. Prosodic characteristics of the neutral tone in. 2008 , publisher=

  37. [37]

    The preliminary study about neutral tone: Dialect effect between North Official

    Li, Jennifer , journal=. The preliminary study about neutral tone: Dialect effect between North Official. 2005 , publisher=

  38. [38]

    Prosodically conditioned neutral-tone realization in

    Li, Qian and Chen, Yiya , journal=. Prosodically conditioned neutral-tone realization in. 2019 , publisher=

  39. [39]

    Disyllabic Tone Sandhi and Neutral Tone Patterns in

    Li, Yan and Wu, Zhiyi , booktitle=. Disyllabic Tone Sandhi and Neutral Tone Patterns in. 2018 , organization=

  40. [40]

    Production of neutral tones in three

    Li, Yang and Thompson, Arthur L , journal=. Production of neutral tones in three. 2016 , publisher=

  41. [41]

    and Tyler, Michael D

    Li, Yanping and Best, Catherine T. and Tyler, Michael D. and Burnham, Denis , booktitle=. Tone Variations in Regionally Accented

  42. [42]

    Beijinghua qingsheng de shengxue xingzhi (The acoustic features of the neutral tone in

    Lin, Maocan and Yan, Jingzhu , journal=. Beijinghua qingsheng de shengxue xingzhi (The acoustic features of the neutral tone in

  43. [43]

    The neutral tone in question intonation in

    Liu, Fang and Xu, Yi , booktitle=. The neutral tone in question intonation in

  44. [44]

    Putonghua de qingsheng he erhua [Neutral tone and rhotacization in

    Lu, Yunzhong , year=. Putonghua de qingsheng he erhua [Neutral tone and rhotacization in

  45. [45]

    and Wang, J

    Luo, C. and Wang, J. , title =. 1957 , publisher =

  46. [46]

    Introduction to

    Ma, Wei-Yun and Chen, Keh-Jiann , booktitle=. Introduction to

  47. [47]

    , author=

    Montreal forced aligner: Trainable text-speech alignment using kaldi. , author=. Interspeech , volume=

  48. [48]

    论普通话的轻声词和儿化词 [On neutral tone words and erhua in

    Qian, Xuelie , journal=. 论普通话的轻声词和儿化词 [On neutral tone words and erhua in

  49. [49]

    2025 , eprint=

    Qwen2.5 Technical Report , author=. 2025 , eprint=

  50. [50]

    The phonetics of the

    Shih, Chilin , journal=. The phonetics of the

  51. [51]

    Boundary-conditioned anticipatory tonal coarticulation in

    Sun, Yan and Shih, Chilin , journal=. Boundary-conditioned anticipatory tonal coarticulation in. 2021 , publisher=

  52. [52]

    Prosodic properties of intonation in two major varieties of

    Tseng, Chin-Chin , booktitle=. Prosodic properties of intonation in two major varieties of

  53. [53]

    , title =

    Wei, Y. , title =. Journal of Southwest Agricultural University (Social Science Edition) 西南农业大学学报(社会科学版) , volume =

  54. [54]

    Xu, Chenzi , journal=. Plastic. 2025 , doi=

  55. [55]

    On word stress in

    Yan, Fang , journal=. On word stress in. 2024 , publisher=

  56. [56]

    2006 , publisher=

    Tone: Phonology , author=. 2006 , publisher=

  57. [57]

    Zhang, Qing , journal=. A. 2005 , publisher=

  58. [58]

    Neutral Tone in

    Zhang, Zhenrui and Hu, Fang , booktitle=. Neutral Tone in