pith. sign in

arxiv: 2605.16578 · v1 · pith:IG5772UInew · submitted 2026-05-15 · 💻 cs.SD · cs.AI· cs.HC· cs.LG

Voice ''Cloning'' is Style Transfer

Pith reviewed 2026-05-19 20:58 UTC · model grok-4.3

classification 💻 cs.SD cs.AIcs.HCcs.LG
keywords voice cloningstyle transferspeech synthesishuman perceptiontrust in AIspeaker homogenizationaudio embeddings
0
0 comments X

The pith

Voice cloning models apply style transfer to source voices rather than faithfully replicating them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that widely used voice cloning systems do not preserve individual voices exactly but instead shift them toward a more uniform, polished style. Human listeners rate the resulting clones as more authoritative, warm, customer-service oriented, and human-like than the originals. Listeners also report higher trust in the clones and greater willingness to share sensitive information with them. The work further documents reduced variation across cloned outputs in accent, speaking rate, and embedding space, indicating homogenization of speaker traits.

Core claim

Voice cloning does not faithfully clone an individual's voice; instead, widely-used models systematically apply style transfer, so that cloned voices are perceived by human annotators as more authoritative, warm, customer-service-like, and human-like than their sources, with higher reported trust and willingness to disclose personal information, plus measurable homogenization in accent, rate, and audio embeddings.

What carries the argument

Human perceptual ratings of cloned versus source voices combined with variance measurements in accent, speaking rate, and audio embedding space.

If this is right

  • Applications that rely on voice cloning for identity preservation will still produce voices that systematically differ from the intended speaker in perceived authority and warmth.
  • Users may disclose more personal information to cloned voices than to the original speakers because of elevated trust ratings.
  • Synthetic speech outputs will exhibit narrower ranges of accent and pace, limiting diversity even when source voices vary widely.
  • Risk assessments for voice cloning must include behavioral effects on listeners beyond technical fidelity metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If style transfer is the dominant mechanism, then fine-tuning on more varied or less polished data could reduce both the positive bias and the homogenization effect.
  • The same mechanism could amplify or mask demographic signals in cloned speech, affecting fairness in applications such as virtual assistants or dubbing.
  • Homogenization may compound over successive cloning generations, further narrowing the distribution of synthetic voices in public media.

Load-bearing premise

Observed rating differences and reduced variance stem from style transfer built into the cloning models rather than from training data choices, model architecture, or evaluation confounders.

What would settle it

Train or fine-tune the same cloning architectures on data that explicitly avoids the observed style shifts and re-run the identical human rating and variance tests; absence of rating gains or variance reduction would falsify the style-transfer account.

Figures

Figures reproduced from arXiv: 2605.16578 by Anna Pot, Federico Bianchi, James Zou, Kaitlyn Zhou, Martijn Bartelds, Yongchan Kwon.

Figure 1
Figure 1. Figure 1: Study pipeline. We collect audio data from n=86 non-native English speakers, which we use as reference audio for voice cloning on three models (ELEVENLABS, COQUI-XTTS, and CHATTERBOX). Each source recording is paired with its cloned counterpart and presented in a randomized order to n=177 annotators, whose ratings we analyze to characterize listener perception and self-reported behavioral responses. 3.1 Au… view at source ↗
Figure 2
Figure 2. Figure 2: Illustrate of cross-sentence voice cloning. 3.2 Voice Cloning We evaluate three widely used TTS models — two open-source (ChatterBox, Coqui-XTTS) and one state-of-the-art proprietary model (ElevenLabs V3). Open-source models were selected to reduce privacy risks by enabling greater control over speaker data, while ElevenLabs was included as a leading proprietary system that provides mechanisms for data rem… view at source ↗
Figure 3
Figure 3. Figure 3: Rating differences between cloned and source voices across all three models tested [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Shifts in classified accent after voice cloning. Sankey diagrams show how source accent [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Changes to cloned audio across 50 rounds of repeated cloning with [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Screenshot of speaker task. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: ElevenLabs Privacy Terms Shared anonymously online via a public research dataset that cannot be used for commercial purposes (explicit guidelines below). Forbidden Uses of the public dataset include: • Generating, enabling, or promoting hate speech, harassment, discrimination, misinformation, or culturally offensive or harmful content • Beyond explicit research purposes, voice cloning, speaker impersonatio… view at source ↗
Figure 9
Figure 9. Figure 9: Comparison cloning with long vs short source clips (37 seconds versus 5 seconds). Long [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: PCA projections of Chatterbox acoustic embeddings under different styles. Across [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Human annotations on ElevenLabs clones under "low expressiveness" [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Rating differences between cloned and source voices by model. [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Rating differences between cloned and source voices by speaker sex. [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Screenshot of Annotation Task 21 [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Change in entropy (denoted as nats) for duration distribution between source and cloned [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Change predicted emotion across the 50 iterative rounds of cloning, visualized with 95% [PITH_FULL_IMAGE:figures/full_fig_p022_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Probability distribution on incorrect speakers for source (top) and cloned (bottom) [PITH_FULL_IMAGE:figures/full_fig_p022_17.png] view at source ↗
read the original abstract

Artificially generated speech is increasingly embedded in everyday life. Voice cloning in particular enables applications where identity preservation is important, such as completing a recording, dubbing in a new language, or preserving the voices of individuals with speech loss. However, in our work, we find that despite the term, voice cloning does not faithfully ''clone'' an individual's voice. Instead, we find that widely-used voice cloning models systematically apply style transfer to source voices. As rated by human annotators, cloned voices are perceived as more authoritative, warm, customer-service-like, and human-like compared to their sources. Human annotators also report greater trust in cloned voices than source voices, and a greater willingness to disclose sensitive personal information to them. Our work furthermore shows that voice cloning leads to homogenization of speaker characteristics, as measured by reduced variance in accent, speaking rate, and the audio embedding space. Together, our results highlight a new set of limitations and risks of voice cloning technology and their potential impact on human behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that voice cloning does not faithfully replicate source voices but instead systematically applies style transfer. Human annotators rate cloned voices as more authoritative, warm, customer-service-like, and human-like than their sources, report greater trust in them, and express greater willingness to disclose sensitive personal information. The work also reports homogenization of speaker characteristics, evidenced by reduced variance in accent, speaking rate, and audio embedding space.

Significance. If the central findings hold after addressing the noted gaps, the work would be significant for speech synthesis and AI ethics. It provides empirical evidence from human ratings and embedding measurements that challenges assumptions of faithful identity preservation in cloning systems and identifies risks of unintended perceptual bias and homogenization that could influence real-world user behavior and trust.

major comments (2)
  1. [§3 (Experimental Setup)] §3 (Experimental Setup): The central claim that the observed perceptual upgrades and variance reduction result from an inherent style-transfer operation inside cloning models rather than training-data distribution is not isolated by any ablation. No comparison to models trained on deliberately heterogeneous or non-professional corpora is reported, nor are training-data style statistics provided, leaving the causal attribution open to the alternative that models simply regress inputs toward the dominant training style.
  2. [§4.1 (Human Annotation Study)] §4.1 (Human Annotation Study): The human-rating results lack reported sample sizes, number of annotators, statistical tests for rating differences, and controls for confounding factors such as audio quality or lexical content. These omissions limit direct support for the claims of systematic style shifts and increased trust.
minor comments (2)
  1. [Abstract] The term 'style transfer' is introduced in the abstract without a concise operational definition; adding one sentence in §2 would improve accessibility.
  2. [Figure 3] Embedding-space variance plots would be clearer with explicit numerical variance values annotated on the figure and consistent axis scaling across source vs. cloned conditions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which identify areas where additional clarity and detail will strengthen the manuscript. We respond to each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [§3 (Experimental Setup)] §3 (Experimental Setup): The central claim that the observed perceptual upgrades and variance reduction result from an inherent style-transfer operation inside cloning models rather than training-data distribution is not isolated by any ablation. No comparison to models trained on deliberately heterogeneous or non-professional corpora is reported, nor are training-data style statistics provided, leaving the causal attribution open to the alternative that models simply regress inputs toward the dominant training style.

    Authors: We appreciate the referee's emphasis on causal isolation. Our study evaluates multiple widely deployed voice cloning systems (both commercial and open-source) that were trained on different corpora; the consistent direction of style shifts and variance reduction across these systems provides indirect support for an inherent operation rather than a single-dataset artifact. We nevertheless agree that the manuscript would benefit from explicitly acknowledging the regression-to-training-style alternative. We will add a paragraph in the Discussion section that presents this possibility, notes the absence of custom ablations on heterogeneous data, and identifies it as an important direction for future controlled experiments. revision: partial

  2. Referee: [§4.1 (Human Annotation Study)] §4.1 (Human Annotation Study): The human-rating results lack reported sample sizes, number of annotators, statistical tests for rating differences, and controls for confounding factors such as audio quality or lexical content. These omissions limit direct support for the claims of systematic style shifts and increased trust.

    Authors: We agree that these methodological details should be stated explicitly in the main text rather than left implicit or relegated to supplementary material. We will revise §4.1 to report the number of annotators, the total number of ratings collected, the statistical tests performed (including p-values), and the controls used to hold lexical content and audio quality constant across source and cloned stimuli. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ratings and variance measurements stand independently

full rationale

The paper reports direct human annotator ratings (authoritative, warm, customer-service-like, trust, disclosure willingness) and quantitative reductions in variance (accent, speaking rate, audio embedding space) as evidence that cloning applies style transfer. No equations, parameter fits, or predictions are presented that reduce by construction to inputs. No self-citations, uniqueness theorems, or ansatzes are invoked to justify the central claim. The results are observational and could be falsified by alternative training data or architectures, satisfying the criteria for a self-contained empirical finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the validity of subjective human ratings as proxies for style transfer and on the representativeness of the tested models and annotators.

axioms (1)
  • domain assumption Human annotator ratings of voice attributes such as authority and warmth reliably indicate systematic style transfer rather than random variation or rater bias.
    The perceptual and trust conclusions rest on these ratings being diagnostic of the underlying mechanism.

pith-pipeline@v0.9.0 · 5718 in / 1186 out tokens · 33635 ms · 2026-05-19T20:58:42.840159+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

  1. [1]

    2023 , eprint=

    Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale , author=. 2023 , eprint=

  2. [2]

    Advances in Neural Information Processing Systems , primaryClass=

    Neural Voice Cloning with a Few Samples , author=. Advances in Neural Information Processing Systems , primaryClass=. 2018 , eprint=

  3. [3]

    2024 , eprint=

    NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models , author=. 2024 , eprint=

  4. [4]

    2023 , eprint=

    NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers , author=. 2023 , eprint=

  5. [5]

    2023 , eprint=

    Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers , author=. 2023 , eprint=

  6. [6]

    C ontrol S peech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control

    Ji, Shengpeng and Chen, Qian and Wang, Wen and Zuo, Jialong and Fang, Minghui and Jiang, Ziyue and Huang, Hai and Wang, Zehan and Cheng, Xize and Zheng, Siqi and Zhao, Zhou. C ontrol S peech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control. Proceedings of the 63rd Annual Meeting of the Association for Co...

  7. [7]

    2024 , eprint=

    EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control , author=. 2024 , eprint=

  8. [8]

    Prompttts: Controllable Text-To-Speech With Text Descriptions , year=

    Guo, Zhifang and Leng, Yichong and Wu, Yihan and Zhao, Sheng and Tan, Xu , booktitle=. Prompttts: Controllable Text-To-Speech With Text Descriptions , year=

  9. [9]

    2023 , eprint=

    InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt , author=. 2023 , eprint=

  10. [10]

    2024 , eprint=

    OpenVoice: Versatile Instant Voice Cloning , author=. 2024 , eprint=

  11. [11]

    2024 , eprint=

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models , author=. 2024 , eprint=

  12. [12]

    Pause-Aware Automatic Dubbing using LLM and Voice Cloning

    Li, Yuang and Guo, Jiaxin and Zhang, Min and Miaomiao, Ma and Rao, Zhiqiang and Zhang, Weidong and He, Xianghui and Wei, Daimeng and Yang, Hao. Pause-Aware Automatic Dubbing using LLM and Voice Cloning. Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024). 2024. doi:10.18653/v1/2024.iwslt-1.2

  13. [13]

    What's in a voice? The legal implications of voice cloning , author=. Ariz. L. Rev. , volume=. 2022 , publisher=

  14. [14]

    Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems

    Platnick, Daniel and Abdelnour, Bishoy and Earl, Eamon and Kumar, Rahul and Rezaei, Zahra and Tsangaris, Thomas and Lagum, Faraj. Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems. Proceedings of the Fifth Workshop on Privacy in Natural Language Processing. 2024

  15. [15]

    1972 , publisher=

    Speech correction , author=. 1972 , publisher=

  16. [16]

    and Ren, Xiang and Dziri, Nouha and Jurafsky, Dan and Sap, Maarten

    Zhou, Kaitlyn and Hwang, Jena D. and Ren, Xiang and Dziri, Nouha and Jurafsky, Dan and Sap, Maarten. REL - A . I .: An Interaction-Centered Approach To Measuring Human- LM Reliance. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)...

  17. [17]

    Miller and Ben A

    Elizabeth J. Miller and Ben A. Steward and Zak Witkower and Clare A. M. Sutherland and Eva G. Krumhuber and Amy Dawel , title =. Psychological Science , volume =. 2023 , doi =

  18. [18]

    Nightingale and Hany Farid , title =

    Sophie J. Nightingale and Hany Farid , title =. Proceedings of the National Academy of Sciences , volume =. 2022 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2120481119 , abstract =

  19. [19]

    2025 , isbn =

    Michel, Shira and Kaur, Sufi and Gillespie, Sarah Elizabeth and Gleason, Jeffrey and Wilson, Christo and Ghosh, Avijit , title =. 2025 , isbn =. doi:10.1145/3715275.3732018 , booktitle =

  20. [20]

    2025 , isbn =

    Du, Jiachen and Huang, Hanyu and Zou, Xinkai and Yin, Shuzi and Gao, Bingjie and Fu, Xinyi , title =. 2025 , isbn =. doi:10.1145/3715070.3749244 , booktitle =

  21. [21]

    and McMahan, Ryan P

    Do, Tiffany D. and McMahan, Ryan P. and Wisniewski, Pamela J. , title =. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems , articleno =. 2022 , isbn =. doi:10.1145/3491102.3517564 , abstract =

  22. [22]

    Plos one , volume=

    Warning: Humans cannot reliably detect speech deepfakes , author=. Plos one , volume=. 2023 , publisher=

  23. [23]

    2024 , isbn =

    El Ali, Abdallah and Venkatraj, Karthikeya Puttur and Morosoli, Sophie and Naudts, Laurens and Helberger, Natali and Cesar, Pablo , title =. 2024 , isbn =. doi:10.1145/3613905.3650750 , booktitle =

  24. [24]

    PLoS One , volume=

    Voice clones sound realistic but not (yet) hyperrealistic , author=. PLoS One , volume=. 2025 , publisher=

  25. [25]

    2025 , isbn =

    R Chavan, Durwa and Moon, Prachi and Dixon, Emma , title =. 2025 , isbn =. doi:10.1145/3663547.3759720 , booktitle =

  26. [26]

    Trends in cognitive sciences , volume=

    Universal dimensions of social cognition: Warmth and competence , author=. Trends in cognitive sciences , volume=. 2007 , publisher=

  27. [27]

    Number of contact center employees in the United States from 2014 to 2024 , year =

  28. [28]

    Gartner Reveals Three Technologies That Will Transform Customer Service and Support By 2028 , year =

  29. [29]

    Global Call Centers Market to Reach \ 494.7 Billion by 2030 , year =

  30. [30]

    Artificial Intelligence in Emergency Communications Centers , year =

  31. [31]

    The future of mobility: how Curb delivers the promised ride with help from Twilio , year =

  32. [32]

    Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

    Labor, Power, and Belonging: The Work of Voice in the Age of AI Reproduction , author=. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

  33. [33]

    Nature Machine Intelligence , volume=

    AI-generated characters for supporting personalized learning and well-being , author=. Nature Machine Intelligence , volume=. 2021 , publisher=

  34. [34]

    Nature , volume=

    An instantaneous voice-synthesis neuroprosthesis , author=. Nature , volume=. 2025 , publisher=

  35. [35]

    Interspeech , year=

    Commonaccent: Exploring large acoustic pretrained models for accent classification based on common voice , author=. Interspeech , year=

  36. [36]

    International conference on machine learning , pages=

    Robust speech recognition via large-scale weak supervision , author=. International conference on machine learning , pages=. 2023 , organization=

  37. [37]

    2025 , eprint=

    Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars , author=. 2025 , eprint=

  38. [38]

    The New York Times , year =

    South Korea Uses AI to Help Seniors with Dementia , author =. The New York Times , year =

  39. [39]

    1987 , publisher =

    The Social Construction of Technological Systems: New Directions in the Sociology and History of Technology , editor =. 1987 , publisher =

  40. [40]

    Computer ethics , pages=

    Do artifacts have politics? , author=. Computer ethics , pages=. 2017 , publisher=

  41. [41]

    Proceedings of the conference on fairness, accountability, and transparency , pages=

    Fairness and abstraction in sociotechnical systems , author=. Proceedings of the conference on fairness, accountability, and transparency , pages=

  42. [42]

    Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency , pages =

    Hutiri, Wiebke and Papakyriakopoulos, Orestis and Xiang, Alice , title =. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency , pages =. 2024 , isbn =. doi:10.1145/3630106.3658911 , abstract =

  43. [43]

    Singapore Journal of Legal Studies , year =

    Vocal Identity Under Siege by AI Voice Cloning Technologies , author =. Singapore Journal of Legal Studies , year =

  44. [44]

    Philosophy & Technology , volume=

    Look Who’s Talking: Voice cloning as tension point between identity and data , author=. Philosophy & Technology , volume=. 2025 , publisher=

  45. [45]

    Philosophy & Technology , volume=

    Simulating Voice and the Simulacra of Voice Clones , author=. Philosophy & Technology , volume=. 2026 , publisher=

  46. [46]

    Philosophy & Technology , volume=

    The role of the voice for identity and implications for voice cloning technology , author=. Philosophy & Technology , volume=. 2025 , publisher=

  47. [47]

    AIES , year=

    Sound check: Auditing audio datasets , author=. AIES , year=

  48. [48]

    Nature , volume=

    AI models collapse when trained on recursively generated data , author=. Nature , volume=. 2024 , publisher=

  49. [49]

    Self-Consuming Generative Models Go

    Alemohammad, Sina and Casco-Rodriguez, Josue and Luzi, Lorenzo and Humayun, Ahmed Imtiaz and Babaei, Hossein and LeJeune, Daniel and Siahkoohi, Ali and Baraniuk, Richard , booktitle =. Self-Consuming Generative Models Go

  50. [50]

    Synthetic Data’s Transformative Role in Foundational Speech Models , year=

    Generating data with text-to-speech and large-language models for conversational speech recognition , author=. Synthetic Data’s Transformative Role in Foundational Speech Models , year=

  51. [51]

    arXiv preprint arXiv:2412.01078 , year=

    Advancing speech language models by scaling supervised fine-tuning with over 60,000 hours of synthetic speech dialogue data , author=. arXiv preprint arXiv:2412.01078 , year=

  52. [52]

    2022 international conference on decision aid sciences and applications (DASA) , pages=

    An overview of automatic speech recognition preprocessing techniques , author=. 2022 international conference on decision aid sciences and applications (DASA) , pages=. 2022 , organization=

  53. [53]

    International Journal of Signal Processing , volume=

    On preprocessing of speech signals , author=. International Journal of Signal Processing , volume=

  54. [54]

    Computers in Human Behavior: Artificial Humans , volume=

    Learning through AI-clones: Enhancing self-perception and presentation performance , author=. Computers in Human Behavior: Artificial Humans , volume=. 2025 , publisher=

  55. [55]

    2026 , isbn =

    Mogi, Yamato and Akahori, Wataru and Yamashita, Naomi , title =. 2026 , isbn =. doi:10.1145/3772318.3790546 , articleno =

  56. [56]

    2026 , isbn =

    Park, Minju and Lee, Seunghyun and Ma, Juhwan and Yoon, Dongwook , title =. 2026 , isbn =. doi:10.1145/3772318.3790266 , booktitle =

  57. [57]

    The Journal of the Acoustical Society of America , volume=

    Physiologic and acoustic differences between male and female voices , author=. The Journal of the Acoustical Society of America , volume=. 1989 , publisher=

  58. [58]

    The Journal of the Acoustical Society of America , volume=

    Discrimination of speaker sex and size when glottal-pulse rate and vocal-tract length are controlled , author=. The Journal of the Acoustical Society of America , volume=. 2007 , publisher=

  59. [59]

    Desplanques, Brecht and Thienpondt, Jenthe and Demuynck, Kris , journal=

  60. [60]

    2015 , publisher=

    Introducing global englishes , author=. 2015 , publisher=

  61. [61]

    English in the world: Teaching and learning the language and literatures/Cambridge UP , year=

    Standards, codification and sociolinguistic realism: The English language in the outer circle , author=. English in the world: Teaching and learning the language and literatures/Cambridge UP , year=

  62. [62]

    and McVicar, Matt and Battenberg, Eric and Nieto, Oriol , title =

    McFee, Brian and Raffel, Colin and Liang, Dawen and Ellis, Daniel P.W. and McVicar, Matt and Battenberg, Eric and Nieto, Oriol , title =. SciPy 2015 , year =. doi:10.25080/Majora-7b98e3ed-003 , url =