pith. machine review for the scientific record. sign in

arxiv: 2605.07291 · v1 · submitted 2026-05-08 · 📡 eess.AS

Recognition: 2 theorem links

· Lean Theorem

Evaluating voice anonymisation using similarity rank disclosure

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:51 UTC · model grok-4.3

classification 📡 eess.AS
keywords voice anonymisationprivacy evaluationsimilarity rank disclosurespeaker verificationequal error ratefeature representationsVoicePrivacy Challenge
0
0 comments X

The pith

Similarity rank disclosure exposes voice anonymisation privacy leaks that equal error rates miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current evaluation of voice anonymisation depends on equal error rates from automatic speaker verification systems, which vary with the choice of classifier and threshold and therefore give an incomplete view of privacy risk. The paper introduces similarity rank disclosure as an information-theoretic alternative that works directly on feature representations instead of classifier outputs. This metric supplies a threshold-independent score for both average and worst-case leakage of speaker identity. When applied to speaker embeddings, fundamental frequency, and phone embeddings from the 2024 VoicePrivacy Challenge systems, it uncovers specific leaks and system weaknesses that EER-based tests overlook. A reader should care because more reliable privacy measurement can guide the design of anonymisation methods that actually protect speaker identity across different evaluation setups.

Core claim

Similarity rank disclosure is an information-theoretic metric that ranks similarities between original and anonymised feature representations to quantify privacy leakage without reference to any classifier or decision threshold. Applied to the 2024 VoicePrivacy Challenge submissions, the metric detects privacy leaks and system-specific weaknesses in speaker embeddings, fundamental frequency, and phone embeddings that remain hidden when the same systems are assessed solely by equal error rate.

What carries the argument

Similarity rank disclosure (SRD), an information-theoretic rank metric computed on feature representations that measures how much original speaker identity remains distinguishable through ordered similarity scores.

If this is right

  • SRD identifies privacy leaks missed by EER-based evaluation of voice anonymisation systems.
  • The metric supports separate assessment of average and worst-case disclosure without choosing a threshold.
  • Representation-level analysis reveals weaknesses tied to particular features such as embeddings or fundamental frequency.
  • SRD functions as a flexible, interpretable complement to existing verification metrics for anonymisation evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adoption of SRD would encourage anonymisation algorithms to reduce ranked similarity leakage rather than optimise only for classifier error rates.
  • The same rank-based approach could be tested on other biometric modalities where feature representations are available.
  • Worst-case SRD scores might become a practical target for regulatory or auditing standards that require guaranteed privacy bounds.

Load-bearing premise

That operating on feature representations with an information-theoretic rank metric supplies a more complete and less misleading characterisation of privacy risk than classifier-dependent equal error rates.

What would settle it

A controlled test in which the same anonymised utterances produce low equal error rate (suggesting strong privacy) but high similarity rank disclosure (suggesting substantial leakage), or the reverse mismatch, on identical data.

Figures

Figures reproduced from arXiv: 2605.07291 by Dorothea Kolossa, Massimiliano Todisco, Matteo Petten\`o, Michele Panariello, Nicholas Evans, Nicolas Gengembre, Rainer Martin, Shilpa Chandra, Themos Stafylakis, Tom B\"ackstr\"om.

Figure 1
Figure 1. Figure 1: Typical and ideal rank histogram distributions for original and anonymised data. is derived from a set of inputs x. The histogram is normal￾ized as an empirical probability distribution p˜k of matching speaker rank. Example empirical probability distributions are shown in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Rank histograms for the ECAPA-TDNN embeddings ETorig and non-timbral embeddings W-NTorig, for original speech (top left plot only) and anonymised speech (all others). 10 0 10 1 10 2 6 × 10 3 2 × 10 2 3 × 10 2 4 × 10 2 System: ORIGINAL Phone F0 10 0 10 1 System: B3 10 0 10 1 System: B4 10 0 10 1 System: B5 10 0 10 1 10 2 6 × 10 3 2 × 10 2 3 × 10 2 4 × 10 2 System: T8-5 10 0 10 1 System: T10-2 10 0 10 1 Syst… view at source ↗
Figure 3
Figure 3. Figure 3: As for [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: As for [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

The evaluation of voice anonymisation remains challenging. Current practice relies on automatic speaker verification metrics such as the equal error rate (EER). Performance estimates dependent on the classifier and operating point provide an incomplete or even misleading characterisation of privacy risk. We investigate the use of similarity rank disclosure (SRD), an information-theoretic metric, which operates on feature representations rather than classifier decisions, providing a threshold-independent assessment of privacy and analysis of both average and worst-case disclosure. We report its application to speaker embeddings, fundamental frequency, and phone embeddings using 2024 VoicePrivacy Challenge systems. The SRD reveals privacy leaks and system-specific weaknesses missed by EER-based evaluation. Findings highlight the merit of representation-level metrics and demonstrate the potential of SRD as a flexible and interpretable tool for the evaluation of voice anonymisation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that similarity rank disclosure (SRD), an information-theoretic metric operating directly on feature representations (speaker embeddings, F0, phone embeddings), provides a threshold-independent evaluation of voice anonymisation privacy that reveals leaks and system-specific weaknesses missed by equal error rate (EER) assessments. It demonstrates this by applying SRD to the 2024 VoicePrivacy Challenge systems.

Significance. If the central claim holds, the work supplies a concrete, representation-level alternative to classifier-dependent EER that could improve privacy auditing in voice anonymisation. The explicit application to recent challenge systems and the focus on both average and worst-case disclosure are strengths that would make the metric practically useful if validated.

major comments (2)
  1. [§4, §5] §4 (Results) and §5 (Discussion): the central assertion that SRD 'reveals privacy leaks and system-specific weaknesses missed by EER' is load-bearing but unsupported by any side-by-side comparison. No table or figure shows that utterances or systems flagged as high-SRD by the rank metric actually produce higher identification success rates under a concrete attacker model than EER predicts.
  2. [§3.2] §3.2 (SRD definition): the information-theoretic rank disclosure is defined on feature representations, yet the manuscript provides no quantitative check that the additional disclosures flagged by SRD correspond to practically exploitable privacy risks rather than statistical dependencies that remain non-actionable for an adversary.
minor comments (2)
  1. [§3] Notation for the SRD formula is introduced without an explicit equation number; cross-referencing would improve readability.
  2. [Abstract, §4] The abstract states that SRD is 'threshold-independent,' but the results section should explicitly contrast the operating-point dependence of EER with the rank-based formulation to make the advantage concrete.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important aspects of how SRD should be positioned relative to EER and practical attacker models. We address each major comment below and propose targeted revisions to improve clarity and strengthen the claims without overstating the current evidence.

read point-by-point responses
  1. Referee: [§4, §5] §4 (Results) and §5 (Discussion): the central assertion that SRD 'reveals privacy leaks and system-specific weaknesses missed by EER' is load-bearing but unsupported by any side-by-side comparison. No table or figure shows that utterances or systems flagged as high-SRD by the rank metric actually produce higher identification success rates under a concrete attacker model than EER predicts.

    Authors: We agree that a direct empirical comparison between SRD-flagged cases and actual identification success rates under a concrete attacker would provide stronger validation. The current manuscript demonstrates that SRD can surface differences (e.g., systems with comparable EER but divergent SRD) at the representation level, which EER cannot capture because it depends on a specific verifier and threshold. However, we do not include an explicit attacker simulation linking high-SRD utterances to higher success rates. In revision we will add a new paragraph in §5 that (a) explicitly states this limitation, (b) illustrates with the existing data how rank information could be leveraged by an adversary (e.g., nearest-neighbor lookup in the embedding space), and (c) outlines a possible future experiment that would close the gap. This keeps the focus on the representation-level contribution while acknowledging the missing link to end-to-end attack performance. revision: partial

  2. Referee: [§3.2] §3.2 (SRD definition): the information-theoretic rank disclosure is defined on feature representations, yet the manuscript provides no quantitative check that the additional disclosures flagged by SRD correspond to practically exploitable privacy risks rather than statistical dependencies that remain non-actionable for an adversary.

    Authors: SRD is intentionally defined on the raw feature representations precisely because voice anonymization is intended to prevent an adversary from extracting usable speaker information from those representations. The rank statistic directly quantifies how much similarity ordering is preserved, which is actionable for any downstream nearest-neighbor or ranking-based attack. That said, the manuscript does not provide a quantitative mapping from SRD values to measured attack success rates. We will revise §3.2 to include a short paragraph explaining why representation-level rank disclosure is a conservative (i.e., upper-bound) indicator of risk, and we will expand the discussion in §5 to reference existing literature on embedding-based attacks that could exploit the disclosed ranks. No new attack experiments will be added in this revision, but the text will make the actionability argument more explicit. revision: partial

Circularity Check

0 steps flagged

SRD introduced as independent information-theoretic metric on features; no reduction to EER or self-inputs

full rationale

The paper defines SRD explicitly as an information-theoretic metric operating directly on feature representations (speaker embeddings, F0, phone embeddings) rather than classifier outputs or decisions. This is positioned as an alternative to EER-based evaluation without deriving SRD from EER, fitting parameters to EER data, or invoking self-citations as load-bearing uniqueness theorems. The reported application to 2024 VoicePrivacy systems and the claim that SRD reveals additional leaks are empirical observations, not tautological redefinitions or predictions forced by construction. No equations or steps in the provided text reduce the central metric or claims to the inputs by definition, self-citation chains, or ansatzes smuggled from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are described. SRD is presented as an information-theoretic metric applied to existing feature types.

pith-pipeline@v0.9.0 · 5470 in / 1075 out tokens · 53215 ms · 2026-05-11T00:51:13.774454+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The SRD is an information-theoretic metric, which operates on feature representations rather than classifier decisions, providing a threshold-independent assessment of privacy and analysis of both average and worst-case disclosure.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    Beyond voice iden- tity, speech recordings can reveal sensitive attributes such as the speaker’s age, gender, emotional state, etc

    Introduction Smart devices and cloud services are nowadays constantly cap- turing and processing speech data [1, 2]. Beyond voice iden- tity, speech recordings can reveal sensitive attributes such as the speaker’s age, gender, emotional state, etc. [2, 3]. The per- vasive capture and inherent sensitivity of speech data, coupled with evolving privacy regul...

  2. [2]

    Evaluating voice anonymisation using similarity rank disclosure

    Related work The strength of any approach to privacy protection is usually es- timated empirically according to a particular threat model [12] and simulated attacks launched to defeat the protection. The strength is then quantified according to some objective met- ric that indicates the attack success rate. In the case of voice anonymisation, the metric r...

  3. [3]

    Information is quantified in bits, hence enabling comparisons between the disclosure of information contained in different speech characteristics

    Similarity Rank Disclosure The SRD provides a framework to measure PII contained within speech utterances [7]. Information is quantified in bits, hence enabling comparisons between the disclosure of information contained in different speech characteristics. Operating di- rectly upon speech features instead of classifier decisions, the SRD provides a class...

  4. [4]

    In the following section, we describe the data used and a set of four feature representations

    Experimental Setup While the original work [7] reports a study of privacy disclo- sure for original, unprotected speech data, we have applied the SRD to the study and comparison of privacy disclosure for speech data treated with different approaches to voice anonymi- sation [6]. In the following section, we describe the data used and a set of four feature...

  5. [5]

    Results We present rank histograms for features described in Section 4.2 and differences in qualitative results derived using a stronger semi-informed attack model. We then present quantitative re- sults derived using metrics described in Section 3.3, followed by a comparison to results from the use of statistical approxi- mations described in Section 3.2...

  6. [6]

    These go beyond a single snapshot like that provided from estimates of the EER in the form of the mean and worst case disclosure and the rank spread

    Discussion The SRD provides revealing insights into the differences in pri- vacy protection for competing anonymisation solutions. These go beyond a single snapshot like that provided from estimates of the EER in the form of the mean and worst case disclosure and the rank spread. By casting evaluation as an identification prob- lem instead of verification...

  7. [7]

    Compared to the automatic speaker verification equal error rate (EER), the SRD offers a more interpretable and fine-grained characterisation of resid- ual privacy risk

    Conclusions We investigated use of the similarity rank disclosure (SRD) for evaluating voice anonymisation, providing an information- theoretic assessment of privacy. Compared to the automatic speaker verification equal error rate (EER), the SRD offers a more interpretable and fine-grained characterisation of resid- ual privacy risk. Results for 2024 V oi...

  8. [8]

    We would also like to thank Rayane Bakari, Nicolas Gengembre, and Olivier Le Blouch (Orange innovation, France) for provid- ing the pre-trained models for W-NT

    Acknowledgements This work was funded by the European Union’s Horizon Europe research and innovation programme grant No 101168193. We would also like to thank Rayane Bakari, Nicolas Gengembre, and Olivier Le Blouch (Orange innovation, France) for provid- ing the pre-trained models for W-NT

  9. [9]

    Understanding the tradeoffs in client-side privacy for downstream speech tasks,

    P. Wu, P. P. Liang, J. Shi, R. Salakhutdinov, S. Watanabe, and L.-P. Morency, “Understanding the tradeoffs in client-side privacy for downstream speech tasks,” in2021 Asia-Pacific Signal and Infor- mation Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2021, pp. 841–848

  10. [10]

    Privacy in speech technology,

    T. Bäckström, “Privacy in speech technology,”Proceedings of the IEEE, vol. 113, no. 7, pp. 668–692, 2025

  11. [11]

    Preserving privacy in speaker and speech characterisation,

    A. Nautsch, A. Jiménez, A. Treiber, J. Kolberg, C. Jasserand, E. Kindt, H. Delgado, M. Todisco, M. A. Hmani, A. Mtibaa, M. A. Abdelraheem, A. Abad, F. Teixeira, D. Matrouf, M. Gomez- Barrero, D. Petrovska-Delacrétaz, G. Chollet, N. Evans, T. Schneider, J.-F. Bonastre, B. Raj, I. Trancoso, and C. Busch, “Preserving privacy in speaker and speech characteris...

  12. [12]

    The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Un- derstanding,

    A. Nautsch, C. Jasserand, E. Kindt, M. Todisco, I. Trancoso, and N. Evans, “The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Un- derstanding,” inInterspeech 2019. ISCA, Sep. 2019, pp. 3695– 3699

  13. [13]

    Introducing the V oicePrivacy Initiative,

    N. Tomashenko, B. M. L. Srivastava, X. Wang, E. Vincent, A. Nautsch, J. Yamagishi, N. Evans, J. Patino, J.-F. Bonastre, P.-G. Noé, and M. Todisco, “Introducing the V oicePrivacy Initiative,” in Interspeech 2020. ISCA, Oct. 2020, pp. 1693–1697

  14. [14]

    Tomashenko, X

    N. Tomashenko, X. Miao, P. Champion, S. Meyer, X. Wang, E. Vincent, M. Panariello, N. Evans, J. Yamagishi, and M. Todisco. The V oicePrivacy 2024 Challenge Evaluation Plan. [Online]. Available: http://arxiv.org/abs/2404.02677

  15. [15]

    Privacy disclosure of similarity rank in speech and language processing,

    T. Bäckström, M. H. Vali, M. Nguyen, and S. Rech, “Privacy disclosure of similarity rank in speech and language processing,” IEEE Transactions on Audio, Speech and Language Processing, vol. 34, pp. 196–205, 2026

  16. [16]

    YIN, a fundamental frequency estimator for speech and music,

    A. de Cheveigné and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,”The Journal of the Acoustical Society of America, vol. 111, no. 4, pp. 1917–1930, 04

  17. [17]
  18. [18]

    Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge,

    B. van Niekerk, L. Nortje, and H. Kamper, “Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge,” inInterspeech 2020, 2020, pp. 4836–4840

  19. [19]

    ECAPA- TDNN: Emphasized Channel Attention, Propagation and Ag- gregation in TDNN Based Speaker Verification,

    B. Desplanques, J. Thienpondt, and K. Demuynck, “ECAPA- TDNN: Emphasized Channel Attention, Propagation and Ag- gregation in TDNN Based Speaker Verification,” inInterspeech 2020, 2020, pp. 3830–3834

  20. [20]

    The influence of non-timbral cues in voice anonymisation and evaluation,

    R. Bakari, O. L. Blouch, N. Evans, N. Gengembre, M. Panariello, and M. Todisco, “The influence of non-timbral cues in voice anonymisation and evaluation,” in5th Symposium on Security and Privacy in Speech Communication, 2025, pp. 35–42

  21. [21]

    Scenario of Use Scheme: Threat Modelling for Speaker Privacy Protection in the Medical Domain,

    M. U. Rahman, M. Larson, L. ten Bosch, and C. Tejedor-García, “Scenario of Use Scheme: Threat Modelling for Speaker Privacy Protection in the Medical Domain,” in4th Symposium on Security and Privacy in Speech Communication, 2024, pp. 21–25

  22. [22]

    A comparative study of speech anonymization metrics,

    M. Maouche, B. M. L. Srivastava, N. Vauquier, A. Bellet, M. Tommasi, and E. Vincent, “A comparative study of speech anonymization metrics,” inINTERSPEECH 2020, 2020

  23. [23]

    The V oicePrivacy 2020 challenge: results and findings,

    N. Tomashenko, X. Wang, E. Vincent, J. Patino, B. M. L. Srivas- tava, P.-G. Noé, A. Nautsch, N. Evans, J. Yamagishi, B. O’Brien et al., “The V oicePrivacy 2020 challenge: results and findings,” Computer Speech & Language, vol. 74, p. 101362, 2022

  24. [24]

    Out of a Hundred Trials, How Many Errors Does Your Speaker Verifier Make?

    N. Brümmer, L. Ferrer, and A. Swart, “Out of a Hundred Trials, How Many Errors Does Your Speaker Verifier Make?” inInter- speech 2021, 2021, pp. 1059–1063

  25. [25]

    t-eer: Parameter-free tandem evaluation of countermeasures and biometric comparators,

    T. H. Kinnunen, K. A. Lee, H. Tak, N. Evans, and A. Nautsch, “t-eer: Parameter-free tandem evaluation of countermeasures and biometric comparators,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 2622–2637, 2024

  26. [26]

    The Privacy ZE- BRA: Zero Evidence Biometric Recognition Assessment,

    A. Nautsch, J. Patino, N. Tomashenko, J. Yamagishi, P.-G. Noé, J.-F. Bonastre, M. Todisco, and N. Evans, “The Privacy ZE- BRA: Zero Evidence Biometric Recognition Assessment,” inIn- terspeech 2020, 2020, pp. 1698–1702

  27. [27]

    Application-independent evaluation of speaker detection,

    N. Brümmer and J. du Preez, “Application-independent evaluation of speaker detection,”Computer Speech & Language, vol. 20, no. 2, pp. 230–275, 2006, odyssey 2004: The speaker and Language Recognition Workshop. [Online]. Available: https:// www.sciencedirect.com/science/article/pii/S0885230805000483

  28. [28]

    Gen- eral framework to evaluate unlinkability in biometric template protection systems,

    M. Gomez-Barrero, J. Galbally, C. Rathgeb, and C. Busch, “Gen- eral framework to evaluate unlinkability in biometric template protection systems,”IEEE Transactions on Information Forensics and Security, vol. 13, no. 6, pp. 1406–1420, 2017

  29. [29]

    Legally validated evaluation framework for voice anonymiza- tion,

    N. Vauquier, B. M. L. Srivastava, S. A. Hosseini, and E. Vincent, “Legally validated evaluation framework for voice anonymiza- tion,” inInterspeech 2025, 2025, pp. 3229–3233

  30. [30]

    Tomashenko, X

    N. Tomashenko, X. Miao, E. Vincent, and J. Yamagishi. The First V oicePrivacy Attacker Challenge Evaluation Plan. [Online]. Available: http://arxiv.org/abs/2410.07428

  31. [31]

    Tomashenko, X

    N. Tomashenko, X. Wang, X. Miao, H. Nourtel, P. Champion, M. Todisco, E. Vincent, N. Evans, J. Yamagishi, and J.-F. Bonastre. The V oicePrivacy 2022 Challenge Evaluation Plan. [Online]. Available: http://arxiv.org/abs/2203.12468

  32. [32]

    WavLM: Large- Scale Self-Supervised Pre-Training for Full Stack Speech Pro- cessing,

    S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao, J. Wu, L. Zhou, S. Ren, Y . Qian, Y . Qian, J. Wu, M. Zeng, X. Yu, and F. Wei, “WavLM: Large- Scale Self-Supervised Pre-Training for Full Stack Speech Pro- cessing,”IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022

  33. [33]

    Lib- rispeech: An ASR corpus based on public domain audio books,

    V . Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Lib- rispeech: An ASR corpus based on public domain audio books,” in2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210

  34. [34]

    SpeechBrain: A general-purpose speech toolkit,

    M. Ravanelli, T. Parcollet, P. Plantinga, A. Rouhe, S. Cornell, L. Lugosch, C. Subakan, N. Dawalatabad, A. Heba, J. Zhong, J.-C. Chou, S.-L. Yeh, S.-W. Fu, C.-F. Liao, E. Rastorgueva, F. Grondin, W. Aris, H. Na, Y . Gao, R. D. Mori, and Y . Bengio. SpeechBrain: A General-Purpose Speech Toolkit. [Online]. Available: http://arxiv.org/abs/2106.04624

  35. [35]

    Disentangling prosody and timbre embeddings via voice conversion,

    N. Gengembre, O. Le Blouch, and C. Gendrot, “Disentangling prosody and timbre embeddings via voice conversion,” inInter- speech 2024, 2024, pp. 2765–2769

  36. [36]

    PYIN a fundamental frequency esti- mator using probabilistic threshold distributions,

    M. Mauch and S. Dixon, “PYIN a fundamental frequency esti- mator using probabilistic threshold distributions,” in2014 IEEE International Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP), 2014, pp. 659–663

  37. [37]

    Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in V oice Conversion,

    M. H. Vali and T. Bäckström, “Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in V oice Conversion,” inInterspeech 2023, 2023, pp. 306–310

  38. [38]

    Exploiting Context-dependent Duration Features for V oice Anonymization Attack Systems,

    N. Tomashenko, E. Vincent, and M. Tommasi, “Exploiting Context-dependent Duration Features for V oice Anonymization Attack Systems,” inInterspeech 2025, 2025, pp. 5128–5132

  39. [39]

    Available: https://arxiv.org/abs/2601.11846

    N. Tomashenko, X. Miao, P. Champion, S. Meyer, M. Panariello, X. Wang, N. Evans, E. Vincent, J. Yamagishi, and M. Todisco, “The third V oicePrivacy challenge: preserving emotional expres- siveness and linguistic content in voice anonymization,”arXiv preprint arXiv:2601.11846, 2026

  40. [40]

    The Risks and Detection of Overestimated Privacy Protection in V oice Anonymisation,

    M. Panariello, S. Meyer, P. Champion, X. Miao, M. Todisco, N. T. Vu, and N. Evans, “The Risks and Detection of Overestimated Privacy Protection in V oice Anonymisation,” in5th Symposium on Security and Privacy in Speech Communication, 2025, pp. 8– 12