pith. machine review for the scientific record. sign in

arxiv: 2604.24199 · v2 · submitted 2026-04-27 · 💻 cs.SD · cs.AI· eess.AS· eess.SP

Recognition: 2 theorem links

· Lean Theorem

Speech Enhancement Based on Drifting Models

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:03 UTC · model grok-4.3

classification 💻 cs.SD cs.AIeess.ASeess.SP
keywords speech enhancementDriftSEdrifting modelsgenerative modelsone-step inferencedistribution matchingunpaired training
0
0 comments X

The pith

DriftSE performs high-fidelity speech enhancement in one step by evolving the pushforward distribution of noisy inputs to match the clean speech distribution via a learned drifting field.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DriftSE, a generative framework that recasts speech enhancement as an equilibrium problem solved by a drifting field rather than iterative sampling. The field acts as a correction vector that drives the distribution of mapped noisy observations directly to the high-density regions of clean speech, supporting both direct mappings and stochastic generation from a Gaussian prior. This setup permits training on unpaired data through distribution matching instead of paired sample supervision. A sympathetic reader would care because the method claims to deliver superior quality to multi-step diffusion approaches while using only a single forward pass on the VoiceBank-DEMAND benchmark.

Core claim

DriftSE formulates denoising as an equilibrium problem where a drifting field evolves the pushforward distribution of a mapping function to directly match the clean speech distribution, enabling native one-step inference without iterative refinement.

What carries the argument

The drifting field: a learned correction vector that guides samples from the pushforward distribution toward high-density regions of the clean speech distribution.

Load-bearing premise

A learned drifting field can reliably evolve the pushforward distribution to match the clean speech distribution in one step without paired samples or iterative refinement.

What would settle it

An experiment showing that DriftSE produces lower perceptual quality than multi-step diffusion models on VoiceBank-DEMAND or requires multiple steps to reach comparable results would falsify the single-step claim.

Figures

Figures reproduced from arXiv: 2604.24199 by Bastiaan Kleijn, Diego Caviedes-Nozal, Liang Xu, Longfei Felix Yan, Rasmus Kongsgaard Olsson.

Figure 1
Figure 1. Figure 1: Overview of the DriftSE framework (illustrating the Direct Mapping formulation). force pulling zi toward the clean distribution Z + and a repul￾sion force pushing it away from the current generated distribu￾tion Z −, driving fθ toward equilibrium. Training Objective: To capture hierarchical speech struc￾tures, the base drifting loss from (9) is computed and aggre￾gated across multiple layers l ∈ S of the l… view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of frame-level distributions in the DistilHuBERT semantic space for a fixed test utterance. Each panel displays 2D density contours (PCA projection) derived from all frames across different training epochs. Stars denote the corresponding centroids, which represent the mean of all projected frames. As training progresses, the generated distribution shifts from the noisy distribution toward the cle… view at source ↗
read the original abstract

We propose Speech Enhancement based on Drifting Models (DriftSE), a novel generative framework that formulates denoising as an equilibrium problem. Rather than relying on iterative sampling, DriftSE natively achieves one-step inference by evolving the pushforward distribution of a mapping function to directly match the clean speech distribution. This evolution is driven by a Drifting Field, a learned correction vector that guides samples toward the high-density regions of the clean distribution, which naturally facilitates training on unpaired data by matching distributions rather than paired samples. We investigate the framework under two formulations: a direct mapping from the noisy observation, and a stochastic conditional generative model from a Gaussian prior. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DriftSE achieves high-fidelity enhancement in a single step, outperforming multi-step diffusion baselines and establishing a new paradigm for speech enhancement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces DriftSE, a generative framework for speech enhancement that formulates denoising as an equilibrium problem. It employs a learned Drifting Field to evolve the pushforward distribution of either a direct mapping from noisy observations or a stochastic conditional model from a Gaussian prior, directly matching the clean speech distribution. This enables one-step inference and unpaired training, with experiments on VoiceBank-DEMAND reporting outperformance over multi-step diffusion baselines.

Significance. If the one-step fidelity claim holds under rigorous validation, the work would be significant for reducing inference cost from iterative sampling to a single forward pass while matching or exceeding diffusion quality, potentially shifting speech enhancement toward direct distribution-matching paradigms.

major comments (2)
  1. [Abstract] Abstract: The load-bearing claim that DriftSE 'achieves high-fidelity enhancement in a single step' without paired samples or iteration rests on the Drifting Field producing semantically faithful mappings, but the equilibrium formulation via marginal distribution matching admits many transport maps and provides no explicit penalty for content alteration or phase distortion that would be inaudible to the loss yet audible in speech.
  2. [Experiments] Experiments section: The reported benchmark outperformance lacks error bars, ablation studies comparing the two formulations (direct mapping vs. stochastic conditional), and content-preservation metrics (e.g., reference-based PESQ or STOI against original clean signals), leaving the support for the central one-step claim moderate as the unpaired training assumption remains untested against content drift.
minor comments (1)
  1. [Abstract] Abstract: The Drifting Field is described only qualitatively as 'a learned correction vector'; a formal definition or equation would clarify how it drives the pushforward evolution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and proposing revisions to strengthen the paper where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The load-bearing claim that DriftSE 'achieves high-fidelity enhancement in a single step' without paired samples or iteration rests on the Drifting Field producing semantically faithful mappings, but the equilibrium formulation via marginal distribution matching admits many transport maps and provides no explicit penalty for content alteration or phase distortion that would be inaudible to the loss yet audible in speech.

    Authors: We appreciate this observation regarding the theoretical properties of marginal distribution matching. In the direct-mapping formulation, the Drifting Field is explicitly conditioned on the noisy observation, which constrains the learned transport to be input-dependent and helps preserve semantic content in practice rather than allowing arbitrary maps. Nevertheless, we acknowledge that the current formulation lacks an explicit penalty for content alteration or phase distortion. In the revised manuscript, we will add a dedicated discussion of these limitations, including potential audible effects, and will revise the abstract to qualify the high-fidelity claim more precisely. revision: partial

  2. Referee: [Experiments] Experiments section: The reported benchmark outperformance lacks error bars, ablation studies comparing the two formulations (direct mapping vs. stochastic conditional), and content-preservation metrics (e.g., reference-based PESQ or STOI against original clean signals), leaving the support for the central one-step claim moderate as the unpaired training assumption remains untested against content drift.

    Authors: We thank the referee for highlighting these gaps in the experimental section. We will add error bars to all reported metrics to convey statistical reliability. We will also include ablation studies that directly compare the direct-mapping and stochastic-conditional formulations. To address content preservation, we will report reference-based PESQ and STOI scores on the VoiceBank-DEMAND test set (which provides clean references for evaluation). Finally, we will include an explicit analysis of the unpaired-training regime, demonstrating that content drift remains limited under the learned drifting field. These additions will be incorporated in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on independent distribution matching

full rationale

The paper formulates speech enhancement as learning a drifting field to evolve a pushforward distribution to match the clean speech distribution, enabling one-step inference on unpaired data. No equations or steps in the provided description reduce the one-step high-fidelity claim to a fitted quantity by construction, a self-citation chain, or a renamed input. The central mechanism (distribution matching via learned vector field) is presented as an external training objective whose success is evaluated on VoiceBank-DEMAND benchmarks, leaving the fidelity guarantee as an empirical claim rather than a definitional tautology. This is self-contained against external data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that a learned drifting field can drive distribution matching without paired data; the drifting field itself is a new postulated mechanism whose parameters are fitted during training.

free parameters (1)
  • Drifting Field parameters
    The correction vector is learned from data to guide samples to high-density regions.
axioms (1)
  • domain assumption The pushforward distribution of the mapping function can be evolved via the drifting field to match the clean speech distribution.
    This is the core modeling premise stated in the abstract for the equilibrium problem.
invented entities (1)
  • Drifting Field no independent evidence
    purpose: A learned correction vector that guides noisy samples toward high-density regions of the clean distribution in one step.
    New concept introduced to enable the equilibrium formulation and one-step inference.

pith-pipeline@v0.9.0 · 5457 in / 1235 out tokens · 51298 ms · 2026-05-13T07:03:51.407895+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 3 internal anchors

  1. [1]

    Introduction The field of Speech Enhancement (SE) has evolved significantly over recent decades, progressing from classical statistical sig- nal processing techniques like Wiener filtering [1, 2] to mod- ern deep learning. Discriminative models, such as RNNs [3], LSTMs [4], and complex spectral mapping [5], effectively sup- press noise but often yield spe...

  2. [2]

    Drifting Models We briefly review Drifting Models [23], which formulate gener- ative modeling as the training-time evolution of a pushforward distribution. 2.1. Pushforward and Equilibrium Given a simple source distributionp ϵ (e.g., standard Gaussian noiseN(0,I)), the drift approach takes a sampleϵ∼p ϵ with ϵ∈R d, and maps it through a parameterized func...

  3. [3]

    Speech Enhancement via Latent Drifting We propose DriftSE, which formulates speech enhancement as an equilibrium problem (Fig. 1). By evolving the mapping func- tion’s pushforward distribution to match the clean speech distri- bution, DriftSE achieves native one-step denoising (1 NFE). 3.1. Two Enhancement Paradigms Lety∈C F×T denote the complex spectrogr...

  4. [4]

    unpaired

    Experiments In this section, we evaluate DriftSE against state-of-the-art it- erative and one-step baselines, and perform ablation studies to analyze the contribution of each design choice. 4.1. Experimental Setup Datasets:We train on clean speech from the V oiceBank cor- pus [28] and noise recordings from the DEMAND dataset [29]. During training, we empl...

  5. [5]

    By utilizing a latent drifting field, DriftSE evolves the mapping function’s pushforward distribution to directly match the clean speech distribution during training

    Conclusion In this paper, we introduced Speech Enhancement based on Drifting Models (DriftSE), a novel paradigm that reformulates denoising as an equilibrium problem to enable native one-step generation. By utilizing a latent drifting field, DriftSE evolves the mapping function’s pushforward distribution to directly match the clean speech distribution dur...

  6. [6]

    Generative AI Use Disclosure We acknowledge the ISCA policy stating that generative AI tools cannot serve as co-authors and should only be used for editing or polishing rather than producing significant parts of this paper. Although the proposed method is a novel generative model for speech enhancement, the authors declare that no gen- erative AI tools we...

  7. [7]

    Multi-channel Speech Enhancement in a Car Environment Using Wiener Filtering and Spectral Sub- traction,

    J. Meyer and K. U. Simmer, “Multi-channel Speech Enhancement in a Car Environment Using Wiener Filtering and Spectral Sub- traction,” inICASSP, vol. 2. IEEE, 1997, pp. 1167–1170

  8. [8]

    An Effective MVDR Post- Processing Method for Low-Latency Convolutive Blind Source Separation,

    J. Chua, L. F. Yan, and W. B. Kleijn, “An Effective MVDR Post- Processing Method for Low-Latency Convolutive Blind Source Separation,” in2024 18th International Workshop on Acoustic Signal Enhancement (IWAENC). IEEE, 2024, pp. 130–134

  9. [9]

    Speech enhancement with LSTM recurrent neural networks and its ap- plication to noise-robust ASR,

    F. Weninger, J. R. Hershey, J. L. Roux, and B. Schuller, “Speech enhancement with LSTM recurrent neural networks and its ap- plication to noise-robust ASR,” inLatent Variable Analysis and Signal Separation (LVA/ICA), vol. 9237, Aug. 2015, pp. 91–99

  10. [10]

    A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement,

    K. Tan and D. Wang, “A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement,” inProc. Interspeech, 2018, pp. 3229–3233

  11. [11]

    DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement,

    Y . Hu, Y . Liu, S. Lyu, M. Xing, S. Zhang, Y . Fu, J. Fan, and L. Xie, “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement,” inProc. Interspeech, 2020, pp. 2472–2476

  12. [12]

    SEGAN: Speech En- hancement Generative Adversarial Network,

    S. Pascual, A. Bonafonte, and J. Serr `a, “SEGAN: Speech En- hancement Generative Adversarial Network,” inProc. Inter- speech, 2017, pp. 3642–3646

  13. [13]

    MetricGAN: Gen- erative Adversarial Networks based Black-box Metric Scores Op- timization for Speech Enhancement,

    S.-W. Fu, C.-F. Liao, Y . Tsao, and S.-D. Lin, “MetricGAN: Gen- erative Adversarial Networks based Black-box Metric Scores Op- timization for Speech Enhancement,” inInternational Conference on Machine Learning. PMLR, 2019, pp. 2031–2041

  14. [14]

    HiFi-GAN-2: Studio-Quality Speech Enhancement via Generative Adversarial Networks Con- ditioned on Acoustic Features,

    J. Su, Z. Jin, and A. Finkelstein, “HiFi-GAN-2: Studio-Quality Speech Enhancement via Generative Adversarial Networks Con- ditioned on Acoustic Features,” in2021 IEEE Workshop on Appli- cations of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2021, pp. 166–170

  15. [15]

    Speech Enhancement and Dereverberation With Diffusion- Based Generative Models,

    J. Richter, S. Welker, J.-M. Lemercier, B. Lay, and T. Gerkmann, “Speech Enhancement and Dereverberation With Diffusion- Based Generative Models,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351–2364, 2023

  16. [16]

    Score-Based Generative Modeling through Stochastic Differential Equations,

    Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Er- mon, and B. Poole, “Score-Based Generative Modeling through Stochastic Differential Equations,” inICLR, 2021

  17. [17]

    StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation,

    J.-M. Lemercier, J. Richter, S. Welker, and T. Gerkmann, “StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation,”IEEE/ACM Transac- tions on Audio, Speech, and Language Processing, vol. 31, pp. 2724–2737, 2023

  18. [18]

    Thun- der: Unified Regression-Diffusion Speech Enhancement with a Single Reverse Step using Brownian Bridge,

    T. Trachu, C. Piansaddhayanon, and E. Chuangsuwanich, “Thun- der: Unified Regression-Diffusion Speech Enhancement with a Single Reverse Step using Brownian Bridge,” inProc. Inter- speech, 2024, pp. 1180–1184

  19. [19]

    Few-step Adversarial Schr¨odinger Bridge for Generative Speech Enhancement,

    S. Han, S. Lee, J. Lee, and K. Lee, “Few-step Adversarial Schr¨odinger Bridge for Generative Speech Enhancement,” in Proc. Interspeech, 2025, pp. 2380–2384

  20. [20]

    Consistency Models,

    Y . Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency Models,” inProceedings of the 40th International Conference on Machine Learning, ser. Proceedings of Machine Learning Re- search, vol. 202. PMLR, 23–29 Jul 2023, pp. 32 211–32 252

  21. [21]

    Consistency Trajectory Mod- els: Learning Probability Flow ODE Trajectory of Diffusion,

    D. Kim, C.-H. Lai, W.-H. Liao, N. Murata, Y . Takida, T. Uesaka, Y . He, Y . Mitsufuji, and S. Ermon, “Consistency Trajectory Mod- els: Learning Probability Flow ODE Trajectory of Diffusion,” in ICLR, 2024

  22. [22]

    Robust One-Step Speech Enhancement via Consistency Distillation,

    L. Xu, L. F. Yan, and W. B. Kleijn, “Robust One-Step Speech Enhancement via Consistency Distillation,” inProc. IEEE Work- shop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Tahoe City, CA, USA: IEEE, Oct. 2025

  23. [23]

    Schr¨odinger Bridge Consistency Trajectory Mod- els for Speech Enhancement,

    S. Nishigori, K. Saito, N. Murata, M. Hirano, S. Takahashi, and Y . Mitsufuji, “Schr¨odinger Bridge Consistency Trajectory Mod- els for Speech Enhancement,” in2025 IEEE Workshop on Appli- cations of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2025, pp. 1–5

  24. [24]

    Flow Matching for Generative Modeling,

    Y . Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow Matching for Generative Modeling,” inICLR, 2023

  25. [25]

    FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching,

    Z. Wang, Z. Liu, X. Zhu, Y . Zhu, M. Liu, J. Chen, L. Xiao, C. Weng, and L. Xie, “FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching,” inProc. Interspeech, 2025

  26. [26]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow,

    X. Liu, C. Gong, and Q. Liu, “Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow,” inICLR, 2023

  27. [27]

    Mean Flows for One-step Generative Modeling,

    Z. Geng, M. Deng, X. Bai, Z. Kolter, and K. He, “Mean Flows for One-step Generative Modeling,” inNeurIPS, 2025

  28. [28]

    Improved Mean Flows: On the Challenges of Fastforward Generative Models

    Z. Geng, Y . Lu, Z. Wu, E. Shechtman, J. Z. Kolter, and K. He, “Improved Mean Flows: On the Challenges of Fastforward Gen- erative Models,”arXiv preprint arXiv:2512.02012, 2025

  29. [29]

    Generative Modeling via Drifting

    M. Deng, H. Li, T. Li, Y . Du, and K. He, “Generative Modeling via Drifting,”arXiv preprint arXiv:2602.04770, 2026

  30. [30]

    Mean shift, mode seeking, and clustering,

    Y . Cheng, “Mean shift, mode seeking, and clustering,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp. 790–799, 1995

  31. [31]

    HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units,

    W.-N. Hsu, B. Bolte, Y .-H. H. Tsai, K. Lakhotia, R. Salakhut- dinov, and A.-r. Mohamed, “HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units,” IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, vol. 29, pp. 3451–3460, 2021

  32. [32]

    WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing,

    S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiaoet al., “WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022

  33. [33]

    DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden- unit BERT,

    H.-J. Chang, S.-w. Yang, and H.-y. Lee, “DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden- unit BERT,” inICASSP. IEEE, 2022, pp. 7087–7091

  34. [34]

    In- vestigating RNN-based speech enhancement methods for noise- robust Text-to-Speech,

    C. Valentini-Botinhao, X. Wang, S. Takaki, and J. Yamagishi, “In- vestigating RNN-based speech enhancement methods for noise- robust Text-to-Speech,” in9th ISCA Workshop on Speech Synthe- sis Workshop (SSW 9), 2016, pp. 146–152

  35. [35]

    The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings,

    J. Thiemann, N. Ito, and E. Vincent, “The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings,” inProceedings of Meetings on Acoustics, vol. 19, no. 1. AIP Publishing, 2013

  36. [36]

    The Interspeech 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results,

    C. K. A. Reddy, V . Gopal, R. Cutler, E. Beyrami, R. Cheng, H. Dubey, S. Matusevych, R. Aichner, A. Aazami, S. Braun, P. Rana, S. Srinivasan, and J. Gehrke, “The Interspeech 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results,” inProc. Interspeech, 2020, pp. 2492–2496

  37. [37]

    Per- ceptual Evaluation of Speech Quality (PESQ): A New Method for Speech Quality Assessment of Telephone Networks and Codecs,

    A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Per- ceptual Evaluation of Speech Quality (PESQ): A New Method for Speech Quality Assessment of Telephone Networks and Codecs,” inICASSP, vol. 2. IEEE, 2001, pp. 749–752

  38. [38]

    An Algorithm for Predicting the In- telligibility of Speech Masked by Modulated Noise Maskers,

    J. Jensen and C. H. Taal, “An Algorithm for Predicting the In- telligibility of Speech Masked by Modulated Noise Maskers,” IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, vol. 24, no. 11, pp. 2009–2022, 2016

  39. [39]

    Sdr – half-baked or well done?

    J. Le Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “Sdr – half-baked or well done?” inICASSP, 2019, pp. 626–630

  40. [40]

    SCOREQ: Speech Qual- ity Assessment with Contrastive Regression,

    A. Ragano, J. Skoglund, and A. Hines, “SCOREQ: Speech Qual- ity Assessment with Contrastive Regression,” inNeurIPS, vol. 37, 2024, pp. 105 702–105 729

  41. [41]

    DNSMOS: A Non- Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors,

    C. K. A. Reddy, V . Gopal, and R. Cutler, “DNSMOS: A Non- Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors,” inICASSP. IEEE, August 2021

  42. [42]

    DNSMOS P.835: A Non- Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors,

    C. K. Reddy, V . Gopal, and R. Cutler, “DNSMOS P.835: A Non- Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors,” inICASSP. IEEE, 2022, pp. 886–890

  43. [43]

    HiFi++: a Uni- fied Framework for Bandwidth Extension and Speech Enhance- ment,

    P. Andreev, A. Alanov, O. Ivanov, and D. Vetrov, “HiFi++: a Uni- fied Framework for Bandwidth Extension and Speech Enhance- ment,” inICASSP. IEEE, 2023, pp. 1–5

  44. [44]

    Mean- FlowSE: one-step generative speech enhancement via conditional mean flow,

    D. Li, S. Lu, H. Pan, Z. Zhan, Q. Hong, and L. Li, “Mean- FlowSE: one-step generative speech enhancement via conditional mean flow,” inICASSP, 2026

  45. [45]

    MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement,

    S.-W. Fu, C. Yu, Y . Tsao, X. Lu, and H. Kawahara, “MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement,” inProc. Interspeech, 2021, pp. 201–205

  46. [46]

    UNI- VERSE++: Universal Score-based Speech Enhancement with High Content Preservation,

    R. Scheibler, Y . Fujita, Y . Shirahata, and T. Komatsu, “UNI- VERSE++: Universal Score-based Speech Enhancement with High Content Preservation,” inProc. Interspeech, 2024