Recognition: unknown
Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment
Pith reviewed 2026-05-10 01:02 UTC · model grok-4.3
The pith
Voice range serves as the primary indicator of text-to-speech model capability while cepstral peak prominence values distinguish natural from robotic speech.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show that voice mapping, built from crest factor, spectrum balance, cepstral peak prominence, and voice range, quantifies TTS quality and expressiveness. Across the six models voice range emerges as the main marker of capability, VITS displays the largest range, Glow-TTS records higher spectrum balance indicating superior soft phonation, and CPPs values between 7-8 dB align with natural voice quality while values exceeding 10 dB correspond to robotic speech. These patterns underscore the value of voice mapping for capturing vocal effort and dynamic range in synthesized speech.
What carries the argument
Voice mapping as a metric-based evaluation framework that combines crest factor, spectrum balance, cepstral peak prominence (CPPs), and voice range to assess naturalness, vocal effort, and expressiveness in TTS outputs.
If this is right
- VITS shows the largest voice range among the tested models, indicating strongest handling of vocal dynamics.
- Glow-TTS achieves higher spectrum balance values, indicating better soft phonation despite its more limited range.
- CPPs values between 7-8 dB mark natural voice quality while values above 10 dB mark robotic speech.
- Voice range functions as the leading single indicator of overall TTS model capability.
- Objective voice mapping is required to evaluate vocal effort and expressiveness beyond what current metrics provide.
Where Pith is reading between the lines
- The same metric set could screen new TTS architectures for naturalness before any listener study is run.
- Training objectives could be adjusted to target the 7-8 dB CPPs window identified for natural output.
- The framework highlights design trade-offs, such as trading voice range for improved phonation control.
- Direct comparison of the same metrics on human reference speech would test whether the thresholds generalize beyond the six models.
Load-bearing premise
That the three chosen acoustic metrics plus voice range fully capture perceived naturalness, vocal effort, and expressiveness without any subjective listening tests or comparison to human speech baselines.
What would settle it
A blind listening test in which human raters score the naturalness and expressiveness of the same TTS samples and the ratings fail to align with the reported CPPs thresholds or voice-range ordering.
read the original abstract
This study investigates voice mapping as an evaluation framework for text-to-speech (TTS) synthesis quality. The study analyzes six TTS models, including historical and recent ones. The metrics are crest factor, spectrum balance, and cepstral peak prominence (CPPs). We investigated 6 influential TTS models: Merlin, Tacotron 2, Transformer TTS, FastSpeech 2, Glow-TTS, and VITS. The results demonstrate that voice range serves as a primary indicator of model capability, with VITS showing the largest range among tested models. Glow-TTS exhibited superior performance in soft phonation, indicated by higher spectrum balance, despite limited voice range. The results showed that the CPPs values between 7-8 dB indicate natural voice quality, while with CPPs exceeding 10 dB, the speech tends to sound robotic. These findings underscore the need for voice mapping to evaluate vocal effort, and capture how TTS systems handle voice dynamic and expressiveness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a voice-mapping evaluation framework for TTS synthesis that computes three acoustic metrics—crest factor, spectrum balance, and cepstral peak prominence (CPPs)—plus voice-range statistics on outputs from six systems (Merlin, Tacotron 2, Transformer TTS, FastSpeech 2, Glow-TTS, VITS). It claims that voice range is the primary indicator of model capability (VITS largest), that Glow-TTS excels in soft phonation via spectrum balance, and that CPPs values of 7–8 dB correspond to natural voice quality while values >10 dB indicate robotic speech.
Significance. If the metric-to-perception mapping were validated, the approach would supply an objective, reference-free method for diagnosing vocal effort, dynamic range, and expressiveness in TTS, potentially reducing reliance on costly listening tests. The multi-model comparison across historical and modern architectures is a positive feature, but the absence of any anchoring data leaves the interpretive claims unsupported.
major comments (3)
- [Abstract / Results] Abstract and results section: the specific thresholds 'CPPs values between 7-8 dB indicate natural voice quality' and 'CPPs exceeding 10 dB, the speech tends to sound robotic' are stated without any accompanying measurements, statistical tests, error bars, raw data tables, or subjective listening tests that would anchor the numerical cutoffs to perceptual labels.
- [Abstract] Abstract: the assertion that 'voice range serves as a primary indicator of model capability' and that the three chosen metrics 'fully capture' vocal effort and expressiveness is presented without justification, ablation studies, or comparison against human-speech baselines, making the central interpretive claims rest on an untested assumption.
- [Abstract] Abstract: no description is given of how the metrics were computed (windowing, normalization, exact CPPs definition, or reference implementations), nor are any quantitative results, confidence intervals, or cross-model statistical comparisons supplied to support the ranking of VITS and Glow-TTS.
minor comments (2)
- [Abstract] The acronym CPPs is introduced without an explicit expansion on first use.
- [Results] The manuscript would benefit from a table summarizing the six models, their training data, and the exact metric values obtained for each.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and results section: the specific thresholds 'CPPs values between 7-8 dB indicate natural voice quality' and 'CPPs exceeding 10 dB, the speech tends to sound robotic' are stated without any accompanying measurements, statistical tests, error bars, raw data tables, or subjective listening tests that would anchor the numerical cutoffs to perceptual labels.
Authors: We acknowledge that the perceptual mapping of CPPs thresholds was presented without sufficient supporting data in the original submission. These ranges were derived from the distribution of computed values across the six TTS systems, where lower CPPs aligned with models producing more natural-sounding output in our internal checks. In the revision, we will add a table reporting mean CPPs values with standard deviations for each model, along with the underlying per-utterance data summary. We will also qualify the statements as observational findings based on the metric distributions rather than validated perceptual cutoffs, and cite relevant literature on CPPs in voice quality. No new listening tests will be added, as the study focuses on objective metrics. revision: partial
-
Referee: [Abstract] Abstract: the assertion that 'voice range serves as a primary indicator of model capability' and that the three chosen metrics 'fully capture' vocal effort and expressiveness is presented without justification, ablation studies, or comparison against human-speech baselines, making the central interpretive claims rest on an untested assumption.
Authors: The claim regarding voice range as a primary indicator stems from the comparative results, where VITS exhibited the widest range consistent with its established performance. We agree that the phrasing 'fully capture' is overstated and will revise the abstract and discussion to state that the metrics 'offer insights into' vocal effort and expressiveness. We will incorporate comparisons against human-speech baselines drawn from standard corpora (e.g., mean voice range and spectrum balance values from natural recordings) in the results section. Ablation studies on metric combinations are outside the current scope but will be noted as future work. revision: partial
-
Referee: [Abstract] Abstract: no description is given of how the metrics were computed (windowing, normalization, exact CPPs definition, or reference implementations), nor are any quantitative results, confidence intervals, or cross-model statistical comparisons supplied to support the ranking of VITS and Glow-TTS.
Authors: We will add a dedicated subsection in the methods describing the exact computation pipeline for each metric, including window length and overlap, normalization steps, the specific CPPs implementation (following the standard definition with 0.5 ms quefrency range), and references to open-source implementations. Quantitative results will be expanded into tables showing per-model means, standard deviations, and 95% confidence intervals, accompanied by statistical comparisons (ANOVA with Tukey post-hoc tests) to justify the reported rankings of VITS and Glow-TTS. revision: yes
Circularity Check
No circularity: empirical metric reporting with no derivation chain
full rationale
The paper is an observational study that computes three acoustic metrics (crest factor, spectrum balance, CPPs) on outputs from six TTS systems and reports ranges and thresholds. No equations, fitted parameters, predictions, or self-citations appear in the provided text. Interpretive statements equating CPPs ranges to 'natural' vs 'robotic' quality are direct observations from the data rather than reductions of any claimed derivation to its own inputs. This is self-contained empirical reporting against the chosen metrics; the absence of subjective listening tests is a validation weakness but not a circularity issue per the enumerated patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The Blizzard Challenge 20132013
King S, Karaiskos V. The Blizzard Challenge 20132013
-
[2]
A Review on Subjective and Objective Evaluation of Synthetic Speech
Cooper E, Huang W-C, Tsao Y, Wang H-m, Toda T, Yamagishi J. A Review on Subjective and Objective Evaluation of Synthetic Speech. Acoustical Science and Technology. 2024;45
2024
-
[3]
Methods for Objective and Subjective Assessment of Quality
Quality TT. Methods for Objective and Subjective Assessment of Quality. ITU-T Recommendation. 1996:830
1996
-
[4]
The Limits of the Mean Opinion Score for Speech Synthesis Evaluation
Le Maguer S, King S, Harte N. The Limits of the Mean Opinion Score for Speech Synthesis Evaluation. Computer Speech & Language. 2024;84:101577. 28
2024
-
[5]
Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech2023
Cooper E, Yamagishi J. Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech2023
-
[6]
Stuck in the Mos Pit: A Critical Analysis of Mos Test Methodology in Tts Evaluation
Kirkland A, Mehta S, Lameris H, Henter GE, Szé kely E, Gustafson J. Stuck in the Mos Pit: A Critical Analysis of Mos Test Methodology in Tts Evaluation. 12th Speech Synthesis Workshop (SSW) 20232023
-
[7]
Fastspeech 2: Fast and high-quality end-to-end text to speech,
Ren Y, Hu C, Tan X, et al. Fastspeech 2: Fast and High-Quality End-to-End Text to Speech. arXiv preprint arXiv:2006.04558. 2020
-
[8]
Natural Tts Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions
Shen J, Pang R, Weiss RJ, et al. Natural Tts Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions. 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP): IEEE; 2018:4779-4783
2018
-
[9]
Naturalspeech: End-to-End Text-to-Speech Synthesis with Human-Level Quality
Tan X, Chen J, Liu H, et al. Naturalspeech: End-to-End Text-to-Speech Synthesis with Human-Level Quality. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2024
2024
-
[10]
Why We Should Report the Details in Subjective Evaluation of Tts More Rigorously
Chiang C-H, Huang W-P, Lee H-y. Why We Should Report the Details in Subjective Evaluation of Tts More Rigorously. arXiv preprint arXiv:2306.02044. 2023
-
[11]
Deep Mos Predictor for Synthetic Speech Using Cluster- Based Modeling
Choi Y, Jung Y, Kim H. Deep Mos Predictor for Synthetic Speech Using Cluster- Based Modeling. arXiv preprint arXiv:2008.03710. 2020
-
[12]
Mosnet: Deep Learning Based Objective Assessment for Voice Conversion
Lo C-C, Fu S-W, Huang W-C, et al. Mosnet: Deep Learning Based Objective Assessment for Voice Conversion. arXiv preprint arXiv:1904.08352. 2019
-
[13]
Mittag G, Naderi B, Chehadi A, Mö ller S. Nisqa: A Deep Cnn-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets. arXiv preprint arXiv:2104.09494. 2021
-
[14]
Deepmos: Deep Posterior Mean- Opinion-Score of Speech
Liang X, Cumlin F, Schü ldt C, Chatterjee S. Deepmos: Deep Posterior Mean- Opinion-Score of Speech. Proceedings of INTERSPEECH2023:526-530
-
[15]
Huang W-C, Cooper E, Tsao Y, Wang H-M, Toda T, Yamagishi J. The Voicemos Challenge 2022. arXiv preprint arXiv:2203.11389. 2022
-
[16]
Experimental Evaluation of Mos, Ab and Bws Listening Test Designs
Wells D, Blanco ALA, Valentini-Botinhao C, et al. Experimental Evaluation of Mos, Ab and Bws Listening Test Designs. INTERSPEECH 2024: Speech and Beyond: International Speech Communication Association (ISCA); 2024
2024
-
[17]
Mel-Cepstral Distance Measure for Objective Speech Quality Assessment
Kubichek R. Mel-Cepstral Distance Measure for Objective Speech Quality Assessment. Proceedings of IEEE pacific rim conference on communications computers and signal processing. Vol 1: IEEE; 1993:125-128
1993
-
[18]
Modified Estoi for Improving Speech Intelligibility Prediction
Alghamdi A, Chan W-Y. Modified Estoi for Improving Speech Intelligibility Prediction. 2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE): IEEE; 2020:1-5
2020
-
[19]
Perceptual Evaluation of Speech Quality (Pesq)-a New Method for Speech Quality Assessment of Telephone Networks and Codecs
Rix AW, Beerends JG, Hollier MP, Hekstra AP. Perceptual Evaluation of Speech Quality (Pesq)-a New Method for Speech Quality Assessment of Telephone Networks and Codecs. 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221). Vol 2: IEEE; 2001:749-752
2001
-
[20]
D4c, a Band-Aperiodicity Estimator for High-Quality Speech Synthesis
Morise M. D4c, a Band-Aperiodicity Estimator for High-Quality Speech Synthesis. Speech Communication. 2016;84:57-65
2016
-
[21]
Cepstrum-Based Pitch Detection Using a New Statistical V/Uv Classification Algorithm
Ahmadi S, Spanias AS. Cepstrum-Based Pitch Detection Using a New Statistical V/Uv Classification Algorithm. IEEE Transactions on speech and audio processing. 1999;7:333-338
1999
-
[22]
Analysis and Assessment of Controllability of an Expressive Deep Learning-Based Tts System
Tits N, El Haddad K, Dutoit T. Analysis and Assessment of Controllability of an Expressive Deep Learning-Based Tts System. Informatics. Vol 8: MDPI; 2021:84
2021
-
[23]
Relationship between Changes in Voice Pitch and Loudness
Gramming P, Sundberg J, Ternströ m S, Leanderson R, Perkins WH. Relationship between Changes in Voice Pitch and Loudness. Journal of Voice. 1988;2:118-126. 29
1988
-
[24]
Loud Speech over Noise: Some Spectral Attributes, with Gender Differences
Ternströ m S, Bohman M, Sö dersten M. Loud Speech over Noise: Some Spectral Attributes, with Gender Differences. The Journal of the Acoustical Society of America. 2006;119:1648-1665
2006
-
[25]
Acoustic Effects of Variation in Vocal Effort by Men, Women, and Children
Traunmü ller H, Eriksson A. Acoustic Effects of Variation in Vocal Effort by Men, Women, and Children. The Journal of the Acoustical Society of America. 2000;107:3438- 3451
2000
-
[26]
Voice Maps as a Tool for Understanding and Dealing with Variability in the Voice
Ternströ m S, Pabon P. Voice Maps as a Tool for Understanding and Dealing with Variability in the Voice. Applied Sciences. 2022;12
2022
-
[27]
Acoustic Analysis, Electroglottography, and Voice Range Profile: A Measure for Outcome of Thyroplasty Type 1
Agresti C, George E, Behrman A, Blumstein E. Acoustic Analysis, Electroglottography, and Voice Range Profile: A Measure for Outcome of Thyroplasty Type 1. Otolaryngology - Head and Neck Surgery. 1996;115:P116
1996
-
[28]
The Phonetogram
Damsté PH. The Phonetogram. Pract Otorhinolaryngol (Basel). 1970;32:185-187
1970
-
[29]
Mapping Individual Voice Quality over the Voice Range : The Measurement Paradigm of the Voice Range Profile [Doctoral thesis, comprehensive summary]
Pabon P. Mapping Individual Voice Quality over the Voice Range : The Measurement Paradigm of the Voice Range Profile [Doctoral thesis, comprehensive summary]. Stockholm: TRITA-EECS-AVL, KTH Royal Institute of Technology; 2018
2018
-
[30]
Feature Maps of the Acoustic Spectrum of the Voice
Pabon P, Ternströ m S. Feature Maps of the Acoustic Spectrum of the Voice. J Voice. 2020;34:161 e161-161 e126
2020
-
[31]
Effects on Voice Quality of Thyroidectomy: A Qualitative and Quantitative Study Using Voice Maps
Cai H, Ternströ m S, Chaffanjon P, Henrich Bernardoni N. Effects on Voice Quality of Thyroidectomy: A Qualitative and Quantitative Study Using Voice Maps. Journal of Voice. 2024
2024
-
[32]
Effects of the Lung Volume on the Electroglottographic Waveform in Trained Female Singers
Ternströ m S, D'Amario S, Selamtzis A. Effects of the Lung Volume on the Electroglottographic Waveform in Trained Female Singers. Journal of Voice. 2020;34:485.e481-485.e421
2020
-
[33]
Quantifying the Cepstral Peak Prominence, a Measure of Dysphonia
Heman-Ackah YD, Sataloff RT, Laureyns G, et al. Quantifying the Cepstral Peak Prominence, a Measure of Dysphonia. J Voice. 2014;28:783-788
2014
-
[34]
A Model of Articulatory Dynamics and Control
CECIL H. A Model of Articulatory Dynamics and Control. PROCEEDINGS OF THE IEEE. 1976;64
1976
-
[35]
A Model of Articulatory Dynamics and Control
Coker CH. A Model of Articulatory Dynamics and Control. Proceedings of the IEEE. 1976;64:452-460
1976
-
[36]
Prospects for Articulatory Synthesis: A Position Paper
Shadle CH, Damper RI. Prospects for Articulatory Synthesis: A Position Paper. 2002
2002
-
[37]
Automatic Generation of Control Signals for a Parallel Formant Speech Synthesizer
Seeviour P, Holmes J, Judd M. Automatic Generation of Control Signals for a Parallel Formant Speech Synthesizer. ICASSP'76. IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol 1: IEEE; 1976:690-693
1976
-
[38]
Pitch-Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis Using Diphones
Moulines E, Charpentier F. Pitch-Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis Using Diphones. Speech communication. 1990;9:453-467
1990
-
[39]
Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database
Hunt AJ, Black AW. Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database. 1996 IEEE international conference on acoustics, speech, and signal processing conference proceedings. Vol 1: IEEE; 1996:373-376
1996
-
[40]
Introduction to Digital Speech Processing
Rabiner LR, Schafer RW. Introduction to Digital Speech Processing. Foundations and Trends® in Signal Processing. 2007;1:1-194
2007
-
[41]
Simultaneous Modeling of Spectrum, Pitch and Duration in Hmm-Based Speech Synthesis
Yoshimura T, Tokuda K, Masuko T, Kobayashi T, Kitamura T. Simultaneous Modeling of Spectrum, Pitch and Duration in Hmm-Based Speech Synthesis. Sixth European conference on speech communication and technology1999
-
[42]
Statistical Parametric Speech Synthesis
Zen H, Tokuda K, Black AW. Statistical Parametric Speech Synthesis. speech communication. 2009;51:1039-1064
2009
-
[43]
Kawahara H, Masuda-Katsuse I, de Cheveigné A. Restructuring Speech Representations Using a Pitch-Adaptive Time–Frequency Smoothing and an Instantaneous-Frequency-Based F0 Extraction: Possible Role of a Repetitive Structure 30 in Sounds1speech Files Available. See Http://Www.Elsevier.Nl/Locate/Specom1. Speech Communication. 1999;27:187-207
1999
-
[44]
Tan X, Qin T, Soong F, Liu T-Y. A Survey on Neural Speech Synthesis. arXiv preprint arXiv:2106.15561. 2021
-
[45]
Statistical Parametric Speech Synthesis Using Deep Neural Networks
Zen H, Senior A, Schuster M. Statistical Parametric Speech Synthesis Using Deep Neural Networks. 2013 ieee international conference on acoustics, speech and signal processing: IEEE; 2013:7962-7966
2013
-
[46]
Merlin: An Open Source Neural Network Speech Synthesis System
Wu Z, Watts O, King S. Merlin: An Open Source Neural Network Speech Synthesis System. 9th ISCA Speech Synthesis Workshop2016:202-207
-
[47]
Wavenet: A Generative Model for Raw Audio
Oord A, Dieleman S, Zen H, et al. Wavenet: A Generative Model for Raw Audio. 2016
2016
-
[48]
Tacotron: Towards end-to-end speech synthesis,
Wang Y, Skerry-Ryan R, Stanton D, et al. Tacotron: Towards End-to-End Speech Synthesis. arXiv preprint arXiv:1703.10135. 2017
-
[49]
Neural Speech Synthesis with Transformer Network
Li N, Liu S, Liu Y, Zhao S, Liu M. Neural Speech Synthesis with Transformer Network. Proceedings of the AAAI conference on artificial intelligence. Vol 332019:6706-6713
-
[50]
Fastpitch: Parallel Text-to-Speech with Pitch Prediction
Łańcucki A. Fastpitch: Parallel Text-to-Speech with Pitch Prediction. ICASSP 2021- 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): IEEE; 2021:6588-6592
2021
-
[51]
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Kim J, Kong J, Son J. Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech. International Conference on Machine Learning: PMLR; 2021:5530-5540
2021
-
[52]
Prosody-Aware Speecht5 for Expressive Neural Tts
Deng Y, Zhou L, Yi Y, Liu S, He L. Prosody-Aware Speecht5 for Expressive Neural Tts. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): IEEE; 2023:1-5
2023
-
[53]
Naturalspeech: End-to-End Text-to-Speech Synthesis with Human-Level Quality
Tan X, Chen J, Liu H, et al. Naturalspeech: End-to-End Text-to-Speech Synthesis with Human-Level Quality. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2024;46:4234-4245
2024
-
[54]
arXiv preprint arXiv:2403.03100 , year=
Ju Z, Wang Y, Shen K, et al. Naturalspeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models. arXiv preprint arXiv:2403.03100. 2024
-
[55]
F5-TTS: A fairytaler that fakes fluent and faithful speech with flow matching,
Chen Y, Niu Z, Ma Z, et al. F5-Tts: A Fairytaler That Fakes Fluent and Faithful Speech with Flow Matching. arXiv preprint arXiv:2410.06885. 2024
-
[56]
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Lyth D, King S. Natural Language Guidance of High-Fidelity Text-to-Speech with Synthetic Annotations. arXiv preprint arXiv:2402.01912. 2024
-
[57]
In: Johnson KIaL, ed2017
The Lj Speech Dataset. In: Johnson KIaL, ed2017
-
[58]
Esp Net2- Tts : Extending the Edge of Tts Research
Hayashi T, Yamamoto R, Yoshimura T, et al. Esp Net2- Tts : Extending the Edge of Tts Research. arXiv preprint arXiv:2110.07840. 2021
-
[59]
Multi-Band Melgan: Faster Waveform Generation for High-Quality Text-to-Speech
Yang G, Yang S, Liu K, Fang P, Chen W, Xie L. Multi-Band Melgan: Faster Waveform Generation for High-Quality Text-to-Speech. 2021 IEEE Spoken Language Technology Workshop (SLT): IEEE; 2021:492-498
2021
-
[60]
Jang W, Lim D, Yoon J, Kim B, Kim J. Univnet: A Neural Vocoder with Multi- Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. arXiv preprint arXiv:2106.07889. 2021
-
[61]
Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women with Structural Dysphonia before and after Treatment
Naomi Anna Iob LH, Sten Ternströ m, Huanchen Cai, Meike Brockmann-Bauser. Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women with Structural Dysphonia before and after Treatment. Journal of Speech, Language, and Hearing Research. 2023
2023
-
[62]
Update 3.1 to Fonadyn : A System for Real-Time Analysis of the Electroglottogram, over the Voice Range
Ternströ m S. Update 3.1 to Fonadyn : A System for Real-Time Analysis of the Electroglottogram, over the Voice Range. SoftwareX. 2024;26. 31
2024
-
[63]
Normalized Time-Domain Parameters for Electroglottographic Waveforms
Ternströ m S. Normalized Time-Domain Parameters for Electroglottographic Waveforms. J Acoust Soc Am. 2019;146:EL65
2019
-
[64]
Spectral-Cepstral Estimation of Dysphonia Severity: External Validation
Awan SN, Solomon NP, Helou LB, Stojadinovic A. Spectral-Cepstral Estimation of Dysphonia Severity: External Validation. Ann Otol Rhinol Laryngol. 2013;122:40-48
2013
-
[65]
Software Automatic Mouth (Sam)
Barton M. Software Automatic Mouth (Sam). Los Angeles: Don't Ask Software; 1982
1982
-
[66]
Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women with Structural Dysphonia before and after Treatment
Iob NA, He L, Ternströ m S, Cai H, Brockmann-Bauser M. Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women with Structural Dysphonia before and after Treatment. Journal of Speech, Language, and Hearing Research. 2024;67:1660-1681
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.