arxiv: 2605.07291 · v1 · submitted 2026-05-08 · 📡 eess.AS

Recognition: 2 theorem links

· Lean Theorem

Evaluating voice anonymisation using similarity rank disclosure

Shilpa Chandra , Matteo Petten\`o , Nicholas Evans , Michele Panariello , Massimiliano Todisco , Tom B\"ackstr\"om , Dorothea Kolossa , Rainer Martin

show 2 more authors

Themos Stafylakis Nicolas Gengembre

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:51 UTC · model grok-4.3

classification 📡 eess.AS

keywords voice anonymisationprivacy evaluationsimilarity rank disclosurespeaker verificationequal error ratefeature representationsVoicePrivacy Challenge

0 comments

The pith

Similarity rank disclosure exposes voice anonymisation privacy leaks that equal error rates miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current evaluation of voice anonymisation depends on equal error rates from automatic speaker verification systems, which vary with the choice of classifier and threshold and therefore give an incomplete view of privacy risk. The paper introduces similarity rank disclosure as an information-theoretic alternative that works directly on feature representations instead of classifier outputs. This metric supplies a threshold-independent score for both average and worst-case leakage of speaker identity. When applied to speaker embeddings, fundamental frequency, and phone embeddings from the 2024 VoicePrivacy Challenge systems, it uncovers specific leaks and system weaknesses that EER-based tests overlook. A reader should care because more reliable privacy measurement can guide the design of anonymisation methods that actually protect speaker identity across different evaluation setups.

Core claim

Similarity rank disclosure is an information-theoretic metric that ranks similarities between original and anonymised feature representations to quantify privacy leakage without reference to any classifier or decision threshold. Applied to the 2024 VoicePrivacy Challenge submissions, the metric detects privacy leaks and system-specific weaknesses in speaker embeddings, fundamental frequency, and phone embeddings that remain hidden when the same systems are assessed solely by equal error rate.

What carries the argument

Similarity rank disclosure (SRD), an information-theoretic rank metric computed on feature representations that measures how much original speaker identity remains distinguishable through ordered similarity scores.

If this is right

SRD identifies privacy leaks missed by EER-based evaluation of voice anonymisation systems.
The metric supports separate assessment of average and worst-case disclosure without choosing a threshold.
Representation-level analysis reveals weaknesses tied to particular features such as embeddings or fundamental frequency.
SRD functions as a flexible, interpretable complement to existing verification metrics for anonymisation evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption of SRD would encourage anonymisation algorithms to reduce ranked similarity leakage rather than optimise only for classifier error rates.
The same rank-based approach could be tested on other biometric modalities where feature representations are available.
Worst-case SRD scores might become a practical target for regulatory or auditing standards that require guaranteed privacy bounds.

Load-bearing premise

That operating on feature representations with an information-theoretic rank metric supplies a more complete and less misleading characterisation of privacy risk than classifier-dependent equal error rates.

What would settle it

A controlled test in which the same anonymised utterances produce low equal error rate (suggesting strong privacy) but high similarity rank disclosure (suggesting substantial leakage), or the reverse mismatch, on identical data.

Figures

Figures reproduced from arXiv: 2605.07291 by Dorothea Kolossa, Massimiliano Todisco, Matteo Petten\`o, Michele Panariello, Nicholas Evans, Nicolas Gengembre, Rainer Martin, Shilpa Chandra, Themos Stafylakis, Tom B\"ackstr\"om.

**Figure 1.** Figure 1: Typical and ideal rank histogram distributions for original and anonymised data. is derived from a set of inputs x. The histogram is normalized as an empirical probability distribution p˜k of matching speaker rank. Example empirical probability distributions are shown in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Rank histograms for the ECAPA-TDNN embeddings ETorig and non-timbral embeddings W-NTorig, for original speech (top left plot only) and anonymised speech (all others). 10 0 10 1 10 2 6 × 10 3 2 × 10 2 3 × 10 2 4 × 10 2 System: ORIGINAL Phone F0 10 0 10 1 System: B3 10 0 10 1 System: B4 10 0 10 1 System: B5 10 0 10 1 10 2 6 × 10 3 2 × 10 2 3 × 10 2 4 × 10 2 System: T8-5 10 0 10 1 System: T10-2 10 0 10 1 Syst… view at source ↗

**Figure 3.** Figure 3: As for [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: As for [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

The evaluation of voice anonymisation remains challenging. Current practice relies on automatic speaker verification metrics such as the equal error rate (EER). Performance estimates dependent on the classifier and operating point provide an incomplete or even misleading characterisation of privacy risk. We investigate the use of similarity rank disclosure (SRD), an information-theoretic metric, which operates on feature representations rather than classifier decisions, providing a threshold-independent assessment of privacy and analysis of both average and worst-case disclosure. We report its application to speaker embeddings, fundamental frequency, and phone embeddings using 2024 VoicePrivacy Challenge systems. The SRD reveals privacy leaks and system-specific weaknesses missed by EER-based evaluation. Findings highlight the merit of representation-level metrics and demonstrate the potential of SRD as a flexible and interpretable tool for the evaluation of voice anonymisation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SRD flags privacy leaks in 2024 voice anonymization systems that EER misses, but without tests against actual attack success the advantage stays unproven.

read the letter

The main point is that this work applies similarity rank disclosure to evaluate voice anonymization on the 2024 VoicePrivacy Challenge systems and finds it picks up leaks that EER misses. The metric works on feature representations like speaker embeddings, F0, and phone embeddings in a threshold-free way. It does a solid job showing system-specific weaknesses through this representation-level analysis. By focusing on rank disclosure rather than classifier outputs, it avoids some of the biases in standard EER calculations and gives both overall and worst-case insights. That's a practical addition for people testing anonymization tools. The soft spot is the missing link to real attacks. The claim that SRD reveals more actionable privacy risk would be stronger with experiments comparing identification success rates on high-SRD versus low-SRD cases, or against EER predictions. As it stands, the extra disclosures could be noise rather than exploitable information. This paper is for the voice privacy evaluation crowd. A reader interested in metric development would get value from seeing SRD in action on current systems. It has enough new application and clear thinking to merit peer review, though it would benefit from tighter validation. I recommend sending it to referees.

Referee Report

2 major / 2 minor

Summary. The paper claims that similarity rank disclosure (SRD), an information-theoretic metric operating directly on feature representations (speaker embeddings, F0, phone embeddings), provides a threshold-independent evaluation of voice anonymisation privacy that reveals leaks and system-specific weaknesses missed by equal error rate (EER) assessments. It demonstrates this by applying SRD to the 2024 VoicePrivacy Challenge systems.

Significance. If the central claim holds, the work supplies a concrete, representation-level alternative to classifier-dependent EER that could improve privacy auditing in voice anonymisation. The explicit application to recent challenge systems and the focus on both average and worst-case disclosure are strengths that would make the metric practically useful if validated.

major comments (2)

[§4, §5] §4 (Results) and §5 (Discussion): the central assertion that SRD 'reveals privacy leaks and system-specific weaknesses missed by EER' is load-bearing but unsupported by any side-by-side comparison. No table or figure shows that utterances or systems flagged as high-SRD by the rank metric actually produce higher identification success rates under a concrete attacker model than EER predicts.
[§3.2] §3.2 (SRD definition): the information-theoretic rank disclosure is defined on feature representations, yet the manuscript provides no quantitative check that the additional disclosures flagged by SRD correspond to practically exploitable privacy risks rather than statistical dependencies that remain non-actionable for an adversary.

minor comments (2)

[§3] Notation for the SRD formula is introduced without an explicit equation number; cross-referencing would improve readability.
[Abstract, §4] The abstract states that SRD is 'threshold-independent,' but the results section should explicitly contrast the operating-point dependence of EER with the rank-based formulation to make the advantage concrete.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important aspects of how SRD should be positioned relative to EER and practical attacker models. We address each major comment below and propose targeted revisions to improve clarity and strengthen the claims without overstating the current evidence.

read point-by-point responses

Referee: [§4, §5] §4 (Results) and §5 (Discussion): the central assertion that SRD 'reveals privacy leaks and system-specific weaknesses missed by EER' is load-bearing but unsupported by any side-by-side comparison. No table or figure shows that utterances or systems flagged as high-SRD by the rank metric actually produce higher identification success rates under a concrete attacker model than EER predicts.

Authors: We agree that a direct empirical comparison between SRD-flagged cases and actual identification success rates under a concrete attacker would provide stronger validation. The current manuscript demonstrates that SRD can surface differences (e.g., systems with comparable EER but divergent SRD) at the representation level, which EER cannot capture because it depends on a specific verifier and threshold. However, we do not include an explicit attacker simulation linking high-SRD utterances to higher success rates. In revision we will add a new paragraph in §5 that (a) explicitly states this limitation, (b) illustrates with the existing data how rank information could be leveraged by an adversary (e.g., nearest-neighbor lookup in the embedding space), and (c) outlines a possible future experiment that would close the gap. This keeps the focus on the representation-level contribution while acknowledging the missing link to end-to-end attack performance. revision: partial
Referee: [§3.2] §3.2 (SRD definition): the information-theoretic rank disclosure is defined on feature representations, yet the manuscript provides no quantitative check that the additional disclosures flagged by SRD correspond to practically exploitable privacy risks rather than statistical dependencies that remain non-actionable for an adversary.

Authors: SRD is intentionally defined on the raw feature representations precisely because voice anonymization is intended to prevent an adversary from extracting usable speaker information from those representations. The rank statistic directly quantifies how much similarity ordering is preserved, which is actionable for any downstream nearest-neighbor or ranking-based attack. That said, the manuscript does not provide a quantitative mapping from SRD values to measured attack success rates. We will revise §3.2 to include a short paragraph explaining why representation-level rank disclosure is a conservative (i.e., upper-bound) indicator of risk, and we will expand the discussion in §5 to reference existing literature on embedding-based attacks that could exploit the disclosed ranks. No new attack experiments will be added in this revision, but the text will make the actionability argument more explicit. revision: partial

Circularity Check

0 steps flagged

SRD introduced as independent information-theoretic metric on features; no reduction to EER or self-inputs

full rationale

The paper defines SRD explicitly as an information-theoretic metric operating directly on feature representations (speaker embeddings, F0, phone embeddings) rather than classifier outputs or decisions. This is positioned as an alternative to EER-based evaluation without deriving SRD from EER, fitting parameters to EER data, or invoking self-citations as load-bearing uniqueness theorems. The reported application to 2024 VoicePrivacy systems and the claim that SRD reveals additional leaks are empirical observations, not tautological redefinitions or predictions forced by construction. No equations or steps in the provided text reduce the central metric or claims to the inputs by definition, self-citation chains, or ansatzes smuggled from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are described. SRD is presented as an information-theoretic metric applied to existing feature types.

pith-pipeline@v0.9.0 · 5470 in / 1075 out tokens · 53215 ms · 2026-05-11T00:51:13.774454+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The SRD is an information-theoretic metric, which operates on feature representations rather than classifier decisions, providing a threshold-independent assessment of privacy and analysis of both average and worst-case disclosure.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Beyond voice iden- tity, speech recordings can reveal sensitive attributes such as the speaker’s age, gender, emotional state, etc

Introduction Smart devices and cloud services are nowadays constantly cap- turing and processing speech data [1, 2]. Beyond voice iden- tity, speech recordings can reveal sensitive attributes such as the speaker’s age, gender, emotional state, etc. [2, 3]. The per- vasive capture and inherent sensitivity of speech data, coupled with evolving privacy regul...

2020
[2]

Evaluating voice anonymisation using similarity rank disclosure

Related work The strength of any approach to privacy protection is usually es- timated empirically according to a particular threat model [12] and simulated attacks launched to defeat the protection. The strength is then quantified according to some objective met- ric that indicates the attack success rate. In the case of voice anonymisation, the metric r...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[3]

Information is quantified in bits, hence enabling comparisons between the disclosure of information contained in different speech characteristics

Similarity Rank Disclosure The SRD provides a framework to measure PII contained within speech utterances [7]. Information is quantified in bits, hence enabling comparisons between the disclosure of information contained in different speech characteristics. Operating di- rectly upon speech features instead of classifier decisions, the SRD provides a class...
[4]

In the following section, we describe the data used and a set of four feature representations

Experimental Setup While the original work [7] reports a study of privacy disclo- sure for original, unprotected speech data, we have applied the SRD to the study and comparison of privacy disclosure for speech data treated with different approaches to voice anonymi- sation [6]. In the following section, we describe the data used and a set of four feature...

2024
[5]

Results We present rank histograms for features described in Section 4.2 and differences in qualitative results derived using a stronger semi-informed attack model. We then present quantitative re- sults derived using metrics described in Section 3.3, followed by a comparison to results from the use of statistical approxi- mations described in Section 3.2...
[6]

These go beyond a single snapshot like that provided from estimates of the EER in the form of the mean and worst case disclosure and the rank spread

Discussion The SRD provides revealing insights into the differences in pri- vacy protection for competing anonymisation solutions. These go beyond a single snapshot like that provided from estimates of the EER in the form of the mean and worst case disclosure and the rank spread. By casting evaluation as an identification prob- lem instead of verification...

2024
[7]

Compared to the automatic speaker verification equal error rate (EER), the SRD offers a more interpretable and fine-grained characterisation of resid- ual privacy risk

Conclusions We investigated use of the similarity rank disclosure (SRD) for evaluating voice anonymisation, providing an information- theoretic assessment of privacy. Compared to the automatic speaker verification equal error rate (EER), the SRD offers a more interpretable and fine-grained characterisation of resid- ual privacy risk. Results for 2024 V oi...

2024
[8]

We would also like to thank Rayane Bakari, Nicolas Gengembre, and Olivier Le Blouch (Orange innovation, France) for provid- ing the pre-trained models for W-NT

Acknowledgements This work was funded by the European Union’s Horizon Europe research and innovation programme grant No 101168193. We would also like to thank Rayane Bakari, Nicolas Gengembre, and Olivier Le Blouch (Orange innovation, France) for provid- ing the pre-trained models for W-NT
[9]

Understanding the tradeoffs in client-side privacy for downstream speech tasks,

P. Wu, P. P. Liang, J. Shi, R. Salakhutdinov, S. Watanabe, and L.-P. Morency, “Understanding the tradeoffs in client-side privacy for downstream speech tasks,” in2021 Asia-Pacific Signal and Infor- mation Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2021, pp. 841–848

2021
[10]

Privacy in speech technology,

T. Bäckström, “Privacy in speech technology,”Proceedings of the IEEE, vol. 113, no. 7, pp. 668–692, 2025

2025
[11]

Preserving privacy in speaker and speech characterisation,

A. Nautsch, A. Jiménez, A. Treiber, J. Kolberg, C. Jasserand, E. Kindt, H. Delgado, M. Todisco, M. A. Hmani, A. Mtibaa, M. A. Abdelraheem, A. Abad, F. Teixeira, D. Matrouf, M. Gomez- Barrero, D. Petrovska-Delacrétaz, G. Chollet, N. Evans, T. Schneider, J.-F. Bonastre, B. Raj, I. Trancoso, and C. Busch, “Preserving privacy in speaker and speech characteris...

2019
[12]

The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Un- derstanding,

A. Nautsch, C. Jasserand, E. Kindt, M. Todisco, I. Trancoso, and N. Evans, “The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Un- derstanding,” inInterspeech 2019. ISCA, Sep. 2019, pp. 3695– 3699

2019
[13]

Introducing the V oicePrivacy Initiative,

N. Tomashenko, B. M. L. Srivastava, X. Wang, E. Vincent, A. Nautsch, J. Yamagishi, N. Evans, J. Patino, J.-F. Bonastre, P.-G. Noé, and M. Todisco, “Introducing the V oicePrivacy Initiative,” in Interspeech 2020. ISCA, Oct. 2020, pp. 1693–1697

2020
[14]

Tomashenko, X

N. Tomashenko, X. Miao, P. Champion, S. Meyer, X. Wang, E. Vincent, M. Panariello, N. Evans, J. Yamagishi, and M. Todisco. The V oicePrivacy 2024 Challenge Evaluation Plan. [Online]. Available: http://arxiv.org/abs/2404.02677

work page arXiv 2024
[15]

Privacy disclosure of similarity rank in speech and language processing,

T. Bäckström, M. H. Vali, M. Nguyen, and S. Rech, “Privacy disclosure of similarity rank in speech and language processing,” IEEE Transactions on Audio, Speech and Language Processing, vol. 34, pp. 196–205, 2026

2026
[16]

YIN, a fundamental frequency estimator for speech and music,

A. de Cheveigné and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,”The Journal of the Acoustical Society of America, vol. 111, no. 4, pp. 1917–1930, 04

1917
[17]

Alexandre Défossez, Jade Copet, Gabriel Synnaeve, and Yossi Adi

[Online]. Available: https://doi.org/10.1121/1.1458024

work page doi:10.1121/1.1458024
[18]

Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge,

B. van Niekerk, L. Nortje, and H. Kamper, “Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge,” inInterspeech 2020, 2020, pp. 4836–4840

2020
[19]

ECAPA- TDNN: Emphasized Channel Attention, Propagation and Ag- gregation in TDNN Based Speaker Verification,

B. Desplanques, J. Thienpondt, and K. Demuynck, “ECAPA- TDNN: Emphasized Channel Attention, Propagation and Ag- gregation in TDNN Based Speaker Verification,” inInterspeech 2020, 2020, pp. 3830–3834

2020
[20]

The influence of non-timbral cues in voice anonymisation and evaluation,

R. Bakari, O. L. Blouch, N. Evans, N. Gengembre, M. Panariello, and M. Todisco, “The influence of non-timbral cues in voice anonymisation and evaluation,” in5th Symposium on Security and Privacy in Speech Communication, 2025, pp. 35–42

2025
[21]

Scenario of Use Scheme: Threat Modelling for Speaker Privacy Protection in the Medical Domain,

M. U. Rahman, M. Larson, L. ten Bosch, and C. Tejedor-García, “Scenario of Use Scheme: Threat Modelling for Speaker Privacy Protection in the Medical Domain,” in4th Symposium on Security and Privacy in Speech Communication, 2024, pp. 21–25

2024
[22]

A comparative study of speech anonymization metrics,

M. Maouche, B. M. L. Srivastava, N. Vauquier, A. Bellet, M. Tommasi, and E. Vincent, “A comparative study of speech anonymization metrics,” inINTERSPEECH 2020, 2020

2020
[23]

The V oicePrivacy 2020 challenge: results and findings,

N. Tomashenko, X. Wang, E. Vincent, J. Patino, B. M. L. Srivas- tava, P.-G. Noé, A. Nautsch, N. Evans, J. Yamagishi, B. O’Brien et al., “The V oicePrivacy 2020 challenge: results and findings,” Computer Speech & Language, vol. 74, p. 101362, 2022

2020
[24]

Out of a Hundred Trials, How Many Errors Does Your Speaker Verifier Make?

N. Brümmer, L. Ferrer, and A. Swart, “Out of a Hundred Trials, How Many Errors Does Your Speaker Verifier Make?” inInter- speech 2021, 2021, pp. 1059–1063

2021
[25]

t-eer: Parameter-free tandem evaluation of countermeasures and biometric comparators,

T. H. Kinnunen, K. A. Lee, H. Tak, N. Evans, and A. Nautsch, “t-eer: Parameter-free tandem evaluation of countermeasures and biometric comparators,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 2622–2637, 2024

2024
[26]

The Privacy ZE- BRA: Zero Evidence Biometric Recognition Assessment,

A. Nautsch, J. Patino, N. Tomashenko, J. Yamagishi, P.-G. Noé, J.-F. Bonastre, M. Todisco, and N. Evans, “The Privacy ZE- BRA: Zero Evidence Biometric Recognition Assessment,” inIn- terspeech 2020, 2020, pp. 1698–1702

2020
[27]

Application-independent evaluation of speaker detection,

N. Brümmer and J. du Preez, “Application-independent evaluation of speaker detection,”Computer Speech & Language, vol. 20, no. 2, pp. 230–275, 2006, odyssey 2004: The speaker and Language Recognition Workshop. [Online]. Available: https:// www.sciencedirect.com/science/article/pii/S0885230805000483

2006
[28]

Gen- eral framework to evaluate unlinkability in biometric template protection systems,

M. Gomez-Barrero, J. Galbally, C. Rathgeb, and C. Busch, “Gen- eral framework to evaluate unlinkability in biometric template protection systems,”IEEE Transactions on Information Forensics and Security, vol. 13, no. 6, pp. 1406–1420, 2017

2017
[29]

Legally validated evaluation framework for voice anonymiza- tion,

N. Vauquier, B. M. L. Srivastava, S. A. Hosseini, and E. Vincent, “Legally validated evaluation framework for voice anonymiza- tion,” inInterspeech 2025, 2025, pp. 3229–3233

2025
[30]

Tomashenko, X

N. Tomashenko, X. Miao, E. Vincent, and J. Yamagishi. The First V oicePrivacy Attacker Challenge Evaluation Plan. [Online]. Available: http://arxiv.org/abs/2410.07428

work page arXiv
[31]

Tomashenko, X

N. Tomashenko, X. Wang, X. Miao, H. Nourtel, P. Champion, M. Todisco, E. Vincent, N. Evans, J. Yamagishi, and J.-F. Bonastre. The V oicePrivacy 2022 Challenge Evaluation Plan. [Online]. Available: http://arxiv.org/abs/2203.12468

work page arXiv 2022
[32]

WavLM: Large- Scale Self-Supervised Pre-Training for Full Stack Speech Pro- cessing,

S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao, J. Wu, L. Zhou, S. Ren, Y . Qian, Y . Qian, J. Wu, M. Zeng, X. Yu, and F. Wei, “WavLM: Large- Scale Self-Supervised Pre-Training for Full Stack Speech Pro- cessing,”IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022

2022
[33]

Lib- rispeech: An ASR corpus based on public domain audio books,

V . Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Lib- rispeech: An ASR corpus based on public domain audio books,” in2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210

2015
[34]

SpeechBrain: A general-purpose speech toolkit,

M. Ravanelli, T. Parcollet, P. Plantinga, A. Rouhe, S. Cornell, L. Lugosch, C. Subakan, N. Dawalatabad, A. Heba, J. Zhong, J.-C. Chou, S.-L. Yeh, S.-W. Fu, C.-F. Liao, E. Rastorgueva, F. Grondin, W. Aris, H. Na, Y . Gao, R. D. Mori, and Y . Bengio. SpeechBrain: A General-Purpose Speech Toolkit. [Online]. Available: http://arxiv.org/abs/2106.04624

work page arXiv
[35]

Disentangling prosody and timbre embeddings via voice conversion,

N. Gengembre, O. Le Blouch, and C. Gendrot, “Disentangling prosody and timbre embeddings via voice conversion,” inInter- speech 2024, 2024, pp. 2765–2769

2024
[36]

PYIN a fundamental frequency esti- mator using probabilistic threshold distributions,

M. Mauch and S. Dixon, “PYIN a fundamental frequency esti- mator using probabilistic threshold distributions,” in2014 IEEE International Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP), 2014, pp. 659–663

2014
[37]

Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in V oice Conversion,

M. H. Vali and T. Bäckström, “Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in V oice Conversion,” inInterspeech 2023, 2023, pp. 306–310

2023
[38]

Exploiting Context-dependent Duration Features for V oice Anonymization Attack Systems,

N. Tomashenko, E. Vincent, and M. Tommasi, “Exploiting Context-dependent Duration Features for V oice Anonymization Attack Systems,” inInterspeech 2025, 2025, pp. 5128–5132

2025
[39]

Available: https://arxiv.org/abs/2601.11846

N. Tomashenko, X. Miao, P. Champion, S. Meyer, M. Panariello, X. Wang, N. Evans, E. Vincent, J. Yamagishi, and M. Todisco, “The third V oicePrivacy challenge: preserving emotional expres- siveness and linguistic content in voice anonymization,”arXiv preprint arXiv:2601.11846, 2026

work page arXiv 2026
[40]

The Risks and Detection of Overestimated Privacy Protection in V oice Anonymisation,

M. Panariello, S. Meyer, P. Champion, X. Miao, M. Todisco, N. T. Vu, and N. Evans, “The Risks and Detection of Overestimated Privacy Protection in V oice Anonymisation,” in5th Symposium on Security and Privacy in Speech Communication, 2025, pp. 8– 12

2025