Mask polarization restores bimodality in SE model predictions via Wasserstein distance at test time, delivering consistent gains across domain shifts and architectures.
Test-Time Adaptation For Speech Enhancement Via Mask Polarization
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Adapting speech enhancement (SE) models to unseen environments is crucial for practical deployments, yet test-time adaptation (TTA) for SE remains largely under-explored due to a lack of understanding of how SE models degrade under domain shifts. We observe that mask-based SE models lose confidence under domain shifts, with predicted masks becoming flattened and losing decisive speech preservation and noise suppression. Based on this insight, we propose mask polarization (MPol), a lightweight TTA method that restores mask bimodality through distribution comparison using the Wasserstein distance. MPol requires no additional parameters beyond the trained model, making it suitable for resource-constrained edge deployments. Experimental results across diverse domain shifts and architectures demonstrate that MPol achieves very consistent gains that are competitive with significantly more complex approaches.
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Test-Time Adaptation For Speech Enhancement Via Mask Polarization
Mask polarization restores bimodality in SE model predictions via Wasserstein distance at test time, delivering consistent gains across domain shifts and architectures.