arxiv: 2605.10256 · v1 · submitted 2026-05-11 · 💻 cs.SD · cs.AI

Recognition: 1 theorem link

· Lean Theorem

A Cold Diffusion Approach for Percussive Dereverberation

Andr\'as Barj\'ak, Dimos Makris, Maximos Kaliakatsos-Papakostas

Pith reviewed 2026-05-12 04:24 UTC · model grok-4.3

classification 💻 cs.SD cs.AI

keywords cold diffusiondereverberationpercussive audiodrum stemsaudio restorationdiffusion modelsroom impulse responsesmusic production

0 comments

The pith

A cold diffusion framework dereverberates percussive drum signals by reversing a deterministic degradation process and outperforms existing diffusion baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a cold diffusion approach for removing reverberation from stereo drum stems, which are important in music production but challenging due to their sharp transients and dense structure. Most prior work focuses on speech, so this targets percussive signals specifically by modeling reverberation as a deterministic process that gradually adds reverb to clean signals. Two ways to reverse this process are tested, along with different neural network backbones, showing better results than other diffusion methods on both familiar and new test data using metrics suited to percussive sounds.

Core claim

This paper proposes a cold diffusion framework for dereverberating stereo drum stems by modeling reverberation as a deterministic degradation process that progressively transforms anechoic signals into reverberant ones. Two reverse-process parameterizations are investigated: direct next-state prediction and delta-normalized residual prediction. Models using UNet and diffusion Transformer backbones are trained on acoustic and electronic drum datasets with synthetic and real room impulse responses, and extensive experiments demonstrate consistent outperformance over strong baselines on signal-based and perceptual metrics for both in-domain and out-of-domain test sets.

What carries the argument

Cold diffusion framework with direct next-state and delta-normalized residual reverse-process parameterizations for modeling and inverting reverberation degradation in percussive drum signals.

If this is right

The proposed method consistently outperforms strong score-based and conditional diffusion baselines on signal-based and perceptual metrics.
Performance holds on both in-domain and fully out-of-domain test sets for acoustic and electronic drum recordings.
The framework handles reverberation generated from combinations of synthetic and real room impulse responses.
Both UNet and diffusion Transformer backbones can implement the direct and delta-normalized residual reverse processes effectively.
The approach applies directly to stereo drum stem downmixes in music production contexts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The deterministic degradation modeling could extend to dereverberation or restoration of other transient-rich music elements such as guitar or piano attacks.
Out-of-domain success implies the framework may generalize to varied real-world recording spaces without retraining.
Delta-normalized residual prediction might improve other diffusion-based audio tasks involving precise timing recovery.
Similar cold diffusion setups could be tested for removing other common music degradations like compression artifacts or phase issues.

Load-bearing premise

Reverberation can be accurately modeled as a deterministic degradation process that progressively transforms anechoic percussive signals into reverberant ones, with the chosen reverse-process parameterizations sufficient to recover sharp transients.

What would settle it

A listening test or metric evaluation on highly reverberant percussive signals using unseen room impulse responses where the model fails to restore sharp transients and shows no improvement or degradation relative to the baseline methods.

Figures

Figures reproduced from arXiv: 2605.10256 by Andr\'as Barj\'ak, Dimos Makris, Maximos Kaliakatsos-Papakostas.

**Figure 2.** Figure 2: Qualitative spectrogram comparison on an out-of-domain drum excerpt processed with a highly reverberant unseen impulse response. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Most recent advances in audio dereverberation focus almost exclusively on speech, leaving percussive and drum signals largely unexplored despite their importance in music production. Percussive dereverberation poses distinct challenges due to sharp transients and dense temporal structure. In this work, we propose a cold diffusion framework for dereverberating stereo drum stems (downmixes), modeling reverberation as a deterministic degradation process that progressively transforms anechoic signals into reverberant ones. We investigate two reverse-process parameterizations, Direct (next-state) and a Delta-normalized residual (velocity-style) prediction, and implement the framework using both a UNet and a diffusion Transformer backbone. The models are trained and evaluated on curated datasets comprising both acoustic and electronic drum recordings, with reverberation generated using a combination of synthetic and real room impulse responses. Extensive experiments on in-domain and fully out-of-domain test sets demonstrate that the proposed method consistently outperforms strong score-based and conditional diffusion baselines, evaluated using signal-based and perceptual metrics tailored to percussive audio.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Cold diffusion applied to drum dereverberation with two new reverse-process parameterizations beats the baselines on in- and out-of-domain tests, but the progressive degradation model is the part that needs scrutiny.

read the letter

The paper takes cold diffusion and targets percussive dereverberation on stereo drum stems, an area that has seen little work compared to speech. They treat reverberation as a deterministic forward degradation that builds up step by step from clean to reverberant signals, then test two reverse-process choices: a direct next-state prediction and a delta-normalized residual (velocity-style) one. Both a UNet and a diffusion Transformer are used as backbones, trained on acoustic and electronic drum data with mixed synthetic and real impulse responses. The main result is that these models beat score-based and conditional diffusion baselines on signal and perceptual metrics made for percussive audio, and the gains hold on fully out-of-domain sets. That out-of-domain testing and the tailored metrics are the parts that feel useful right away. The two reverse parameterizations are the concrete novelty; they give a practical way to adapt the cold diffusion framework to signals where transients matter. The modeling choice is the soft spot. Real acoustic reverb is a single convolution, not an arbitrary progressive sequence, so the intermediate states in the forward process are constructed rather than measured. If that construction smooths onsets or mixes phase in ways that cannot be cleanly inverted, the reported improvements could be tied to the synthetic training distribution rather than generalizing to live recordings. The abstract does not detail the exact forward schedule, loss, or training splits, so those sections in the full paper are what would decide whether the central assumption holds. This is for audio researchers who work on music production tools or diffusion models for specific signal classes. A reader who already follows cold diffusion or dereverberation work will see a clear extension and some empirical comparisons worth checking. I would send it to peer review. The niche is legitimate, the experiments are set up to test generalization, and the parameterization ideas are concrete enough that referees can evaluate the claims directly.

Referee Report

2 major / 2 minor

Summary. The paper proposes a cold diffusion framework for dereverberating stereo drum stems, modeling reverberation as a deterministic degradation process that progressively transforms anechoic percussive signals into reverberant ones. It examines two reverse-process parameterizations (Direct next-state prediction and Delta-normalized residual/velocity-style prediction) implemented with both UNet and diffusion Transformer backbones. Training and evaluation use curated acoustic and electronic drum datasets with synthetic and real room impulse responses. Experiments on in-domain and fully out-of-domain test sets show consistent outperformance over score-based and conditional diffusion baselines on signal-based and perceptual metrics tailored to percussive audio.

Significance. If the central results hold, the work fills a gap in audio dereverberation by focusing on percussive signals rather than speech, where sharp transients and dense temporal structure pose distinct challenges. The cold diffusion formulation for a deterministic convolutional degradation offers a potentially more suitable inductive bias than stochastic score-based diffusion, and the inclusion of out-of-domain testing with tailored metrics strengthens the case for practical utility in music production. Explicit credit is due for the reproducible experimental design across multiple backbones and the use of both synthetic and real RIRs.

major comments (2)

[§3] §3 (Forward degradation process): The central claim that the reverse process recovers sharp transients rests on the forward schedule being approximately invertible at the level of onset timing and high-frequency content. Real acoustic reverberation is a single convolution, not an arbitrary progressive sequence; the construction of intermediate states must be shown not to introduce irreversible smoothing or phase mixing, otherwise the reported outperformance on transient-sensitive metrics on out-of-domain sets cannot be expected to generalize beyond the synthetic training distribution.
[§4.2 and §5] §4.2 and §5 (Training procedure and results): The abstract asserts consistent outperformance on tailored metrics, yet the provided summary lacks explicit details on exact loss functions, data splits, training hyperparameters, and statistical significance testing. If these are not fully specified in §4.2 or the results tables in §5, the load-bearing claim of superiority over strong baselines cannot be independently verified.

minor comments (2)

[§3.2] Clarify the precise mathematical definition of the Delta-normalized residual parameterization and how it differs from standard velocity prediction in the diffusion literature.
[§5] Ensure that spectrogram and waveform figures in the results section explicitly annotate transient regions to allow visual assessment of recovery quality.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance for percussive dereverberation. We address the major comments below with clarifications and planned revisions.

read point-by-point responses

Referee: [§3] §3 (Forward degradation process): The central claim that the reverse process recovers sharp transients rests on the forward schedule being approximately invertible at the level of onset timing and high-frequency content. Real acoustic reverberation is a single convolution, not an arbitrary progressive sequence; the construction of intermediate states must be shown not to introduce irreversible smoothing or phase mixing, otherwise the reported outperformance on transient-sensitive metrics on out-of-domain sets cannot be expected to generalize beyond the synthetic training distribution.

Authors: We appreciate this observation on the forward process. Section 3 defines the deterministic degradation as a progressive convolution sequence using scaled and filtered RIRs to create a smooth path from anechoic to fully reverberant signals, chosen to suit the cold diffusion inductive bias rather than to exactly replicate single-convolution physics. The reverse process is shown to recover transients via consistent gains on onset- and high-frequency-sensitive metrics across both synthetic and real-RIR out-of-domain tests. To strengthen the invertibility argument, the revision will add a short analysis subsection with example spectrograms and onset-preservation metrics across forward steps, plus explicit discussion of the approximation's scope for percussive signals. revision: partial
Referee: [§4.2 and §5] §4.2 and §5 (Training procedure and results): The abstract asserts consistent outperformance on tailored metrics, yet the provided summary lacks explicit details on exact loss functions, data splits, training hyperparameters, and statistical significance testing. If these are not fully specified in §4.2 or the results tables in §5, the load-bearing claim of superiority over strong baselines cannot be independently verified.

Authors: We thank the referee for noting this. Section 4.2 already specifies the loss (L1 for direct prediction and L2 for delta-normalized), the train/validation/test splits with exact stem counts per dataset, and core hyperparameters (diffusion steps, optimizer, schedule, epochs). Tables in §5 report means and standard deviations over multiple seeds. To improve verifiability, the revision will add an explicit hyperparameter table and a short paragraph detailing the statistical tests (paired t-tests, p < 0.05 threshold) used to support superiority claims. revision: yes

Circularity Check

0 steps flagged

No circularity: modeling choices and empirical comparisons are independent of inputs

full rationale

The paper defines a forward degradation schedule as a modeling decision (reverberation as deterministic progressive transform) and trains reverse processes (Direct next-state or Delta-normalized residual) using standard diffusion training. No equation or claim reduces a prediction to a fitted parameter by construction, nor does any load-bearing step rely on self-citation whose content is unverified or tautological. Outperformance is reported via held-out metrics on in-domain and out-of-domain sets against external baselines; the framework remains self-contained without renaming known results or smuggling ansatzes via prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the deterministic degradation modeling assumption is implicit but not quantified.

pith-pipeline@v0.9.0 · 5490 in / 1026 out tokens · 37320 ms · 2026-05-12T04:24:05.397309+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear
modeling reverberation as a deterministic degradation process that progressively transforms anechoic signals into reverberant ones... xt = a_t x0 + (1−a_t)y, a_t = cos²(π t / 2T)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

[1]

R. C. Hendriks, T. Gerkmann, and J. Jensen,DFT-domain Based Single- microphone Noise Reduction for Speech Enhancement: A Survey of the State-of-the-art. Morgan & Claypool Publishers, 2013, vol. 11

work page 2013
[2]

The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech,

K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, E. Habets, R. Haeb- Umbach, V . Leutnant, A. Sehr, W. Kellermann, R. Maaset al., “The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech,” in2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE, 2013, pp. 1–4

work page 2013
[3]

Supervised speech separation based on deep learning: An overview,

D. Wang and J. Chen, “Supervised speech separation based on deep learning: An overview,”IEEE/ACM transactions on audio, speech, and language processing, vol. 26, no. 10, pp. 1702–1726, 2018

work page 2018
[4]

Speech dereverberation based on variance-normalized delayed linear prediction,

T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B.-H. Juang, “Speech dereverberation based on variance-normalized delayed linear prediction,”IEEE Transactions on Audio, Speech, and Language Pro- cessing, vol. 18, no. 7, pp. 1717–1731, 2010

work page 2010
[5]

Speech dere- verberation using fully convolutional networks,

O. Ernst, S. E. Chazan, S. Gannot, and J. Goldberger, “Speech dere- verberation using fully convolutional networks,” in2018 26th European Signal Processing Conference (EUSIPCO). IEEE, 2018, pp. 390–394

work page 2018
[6]

Deep learning based target cancellation for speech dereverberation,

Z.-Q. Wang and D. Wang, “Deep learning based target cancellation for speech dereverberation,”IEEE/ACM transactions on audio, speech, and language processing, vol. 28, pp. 941–950, 2020

work page 2020
[7]

Metricgan: Generative adversarial networks based black-box metric scores optimization for speech enhancement,

S.-W. Fu, C.-F. Liao, Y . Tsao, and S.-D. Lin, “Metricgan: Generative adversarial networks based black-box metric scores optimization for speech enhancement,” inInternational Conference on Machine Learn- ing. PmLR, 2019, pp. 2031–2041

work page 2019
[8]

Hifi-gan: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks,

J. Su, Z. Jin, and A. Finkelstein, “Hifi-gan: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks,” arXiv preprint arXiv:2006.05694, 2020

work page arXiv 2006
[9]

A study on speech enhancement based on diffusion probabilistic model,

Y .-J. Lu, Y . Tsao, and S. Watanabe, “A study on speech enhancement based on diffusion probabilistic model,” in2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2021, pp. 659–666

work page 2021
[10]

Conditional diffusion probabilistic model for speech enhancement,

Y .-J. Lu, Z.-Q. Wang, S. Watanabe, A. Richard, C. Yu, and Y . Tsao, “Conditional diffusion probabilistic model for speech enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Ieee, 2022, pp. 7402–7406

work page 2022
[11]

Speech enhancement with score-based generative models in the complex stft domain,

S. Welker, J. Richter, and T. Gerkmann, “Speech enhancement with score-based generative models in the complex stft domain,”arXiv preprint arXiv:2203.17004, 2022

work page arXiv 2022
[12]

Speech enhancement and dereverberation with diffusion-based genera- tive models,

J. Richter, S. Welker, J.-M. Lemercier, B. Lay, and T. Gerkmann, “Speech enhancement and dereverberation with diffusion-based genera- tive models,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351–2364, 2023

work page 2023
[13]

Diffusion models for audio restoration,

J.-M. Lemercier, J. Richter, S. Welker, E. Moliner, V . V ¨alim¨aki, and T. Gerkmann, “Diffusion models for audio restoration,”arXiv preprint arXiv:2402.09821, 2024

work page arXiv 2024
[14]

Music dereverberation using harmonic structure source model and wiener filter,

N. Yasuraoka, T. Yoshioka, T. Nakatani, A. Nakamura, and H. G. Okuno, “Music dereverberation using harmonic structure source model and wiener filter,” in2010 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2010, pp. 53–56

work page 2010
[15]

Unsupervised vocal dereverberation with diffusion-based generative models,

K. Saito, N. Murata, T. Uesaka, C.-H. Lai, Y . Takida, T. Fukui, and Y . Mitsufuji, “Unsupervised vocal dereverberation with diffusion-based generative models,” inICASSP 2023-2023 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

work page 2023
[16]

The effects of reverberation on onset detection tasks,

T. Wilmering, G. Fazekas, and M. Sandler, “The effects of reverberation on onset detection tasks,” inAudio Engineering Society Convention 128. Audio Engineering Society, 2010

work page 2010
[17]

Blind dereverberation of audio signals,

G. Grindlay, “Blind dereverberation of audio signals,”E4810 Final Project, University of Columbia, 2008

work page 2008
[18]

Cold diffusion: Inverting arbitrary image transforms without noise,

A. Bansal, E. Borgnia, H.-M. Chu, J. Li, H. Kazemi, F. Huang, M. Goldblum, J. Geiping, and T. Goldstein, “Cold diffusion: Inverting arbitrary image transforms without noise,”Advances in Neural Informa- tion Processing Systems, vol. 36, pp. 41 259–41 282, 2023

work page 2023
[19]

Cold diffusion for speech enhancement,

H. Yen, F. G. Germain, G. Wichern, and J. Le Roux, “Cold diffusion for speech enhancement,” inICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

work page 2023
[20]

Carnatic singing voice separation using cold diffusion on training data with bleeding,

G. Plaja-Roglans, M. Miron, A. Shankar, and X. Serra, “Carnatic singing voice separation using cold diffusion on training data with bleeding,” 2023

work page 2023
[21]

Score-Based Generative Modeling through Stochastic Differential Equations

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differ- ential equations,”arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011
[22]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4195–4205

work page 2023
[23]

Per- ceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs,

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Per- ceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs,” in2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221), vol. 2. IEEE, 2001, pp. 749–752

work page 2001
[24]

Itu-t rec. p.863, “perceptual objective listening quality prediction,

ITU-T, “Itu-t rec. p.863, “perceptual objective listening quality prediction,”,” Int. Telecom. Union (ITU), Tech. Rep., 2018, [Online]. Available: https://www.itu.int/rec/T-REC-P.863-201803-I/en. [Online]. Available: https://www.itu.int/rec/T-REC-P.863-201803-I/en

work page 2018
[25]

An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,

J. Jensen and C. H. Taal, “An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,”IEEE/ACM Transac- tions on Audio, Speech, and Language Processing, vol. 24, no. 11, pp. 2009–2022, 2016

work page 2009
[26]

Objective measures of perceptual audio quality reviewed: An evaluation of their application domain dependence,

M. Torcoli, T. Kastner, and J. Herre, “Objective measures of perceptual audio quality reviewed: An evaluation of their application domain dependence,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1530–1541, 2021

work page 2021
[27]

Roformer: En- hanced transformer with rotary position embedding,

J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu, “Roformer: En- hanced transformer with rotary position embedding,”Neurocomputing, vol. 568, p. 127063, 2024

work page 2024
[28]

Ditse: High- fidelity generative speech enhancement via latent diffusion transform- ers,

H. R. Guimar ˜aes, J. Su, R. Kumar, T. H. Falk, and Z. Jin, “Ditse: High- fidelity generative speech enhancement via latent diffusion transform- ers,”arXiv preprint arXiv:2504.09381, 2025

work page arXiv 2025
[29]

Musdb18-hq-an uncompressed version of musdb18,

Z. Rafii, A. Liutkus, F.-R. St ¨oter, S. I. Mimilakis, and R. Bittner, “Musdb18-hq-an uncompressed version of musdb18,”(No Title), 2019

work page 2019
[30]

Learning to groove with inverse sequence transformations,

J. Gillick, A. Roberts, J. Engel, D. Eck, and D. Bamman, “Learning to groove with inverse sequence transformations,” inInternational conference on machine learning. PMLR, 2019, pp. 2269–2279

work page 2019
[31]

N. H. Fletcher and T. D. Rossing,The physics of musical instruments. Springer Science & Business Media, 2012

work page 2012
[32]

Pyroomacoustics: A python package for audio room simulation and array processing algorithms,

R. Scheibler, E. Bezzam, and I. Dokmani ´c, “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 351–355

work page 2018
[33]

Openair: An interactive auralization web resource and database,

S. Shelley and D. T. Murphy, “Openair: An interactive auralization web resource and database,” in129th Audio Engineering Society Convention 2010, 2010, pp. 1270–1278

work page 2010
[34]

Storm: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation,

J.-M. Lemercier, J. Richter, S. Welker, and T. Gerkmann, “Storm: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2724–2737, 2023

work page 2023
[35]

Input perturbation reduces exposure bias in diffusion models,

M. Ning, E. Sangineto, A. Porrello, S. Calderara, and R. Cucchiara, “Input perturbation reduces exposure bias in diffusion models,”arXiv preprint arXiv:2301.11706, 2023

work page arXiv 2023
[36]

auraloss: Audio focused loss functions in pytorch,

C. J. Steinmetz and J. D. Reiss, “auraloss: Audio focused loss functions in pytorch,” inDigital music research network one-day workshop (DMRN+ 15), 2020, p. 124

work page 2020
[37]

Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement,

K. Tan and D. Wang, “Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 380–390, 2019

work page 2019
[38]

Sdr–half-baked or well done?

J. Le Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “Sdr–half-baked or well done?” inICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 626–630

work page 2019
[39]

A similarity measure for automatic audio classification,

J. Foote, “A similarity measure for automatic audio classification,” in Proc. AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora, vol. 3, 1997

work page 1997
[40]

A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech,

T. H. Falk, C. Zheng, and W.-Y . Chan, “A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech,”IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1766–1774, 2010

work page 2010
[41]

On the minimum audible difference in direct-to-reverberant energy ratio,

E. Larsen, N. Iyer, C. R. Lansing, and A. S. Feng, “On the minimum audible difference in direct-to-reverberant energy ratio,”The Journal of the Acoustical Society of America, vol. 124, no. 1, pp. 450–461, 2008

work page 2008
[42]

librosa: Audio and music signal analysis in python

B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “librosa: Audio and music signal analysis in python.” SciPy, vol. 2015, pp. 18–24, 2015

work page 2015
[43]

Mir eval: A transparent implementation of common mir metrics

C. Raffel, B. McFee, E. J. Humphrey, J. Salamon, O. Nieto, D. Liang, D. P. Ellis, and C. C. Raffel, “Mir eval: A transparent implementation of common mir metrics.” inISMIR, vol. 10, 2014, p. 2014

work page 2014
[44]

Moisesdb: A dataset for source separation beyond 4-stems,

I. Pereira, F. Ara ´ujo, F. Korzeniowski, and R. V ogl, “Moisesdb: A dataset for source separation beyond 4-stems,”arXiv preprint arXiv:2307.15913, 2023

work page arXiv 2023