SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement

Xingchen Li , Hanke Xie , Ziqian Wang , Zihan Zhang , Longshuai Xiao , Shuai Wang , Lei Xie

Authors on Pith no claims yet

classification 📡 eess.AS

keywords speechenhancementgenerativesensesemanticuniversalconditioningflow

read the original abstract

Generative Universal Speech Enhancement (USE) methods aim to leverage generative models to improve speech quality under various types of distortions. However, existing generative speech enhancement methods often suffer from semantic inconsistency in the generated outputs. Therefore, we propose SenSE, a novel two-stage generative universal speech enhancement framework, by modeling semantic priors with a language model, the flow matching-based speech enhancement process is guided to generate semantically faithful speech, thereby effectively improving context fidelity. In addition, we introduce a dual-path masked conditioning training strategy that enables flow matching-based enhancement to flexibly integrate multi-source conditioning signals from degraded speech, semantic tokens, and reference speech, thereby improving model flexibility and adaptability. Experimental results demonstrate that SenSE achieves state-of-the-art performance among generative speech enhancement models and exhibits a high performance ceiling, particularly under challenging distortion conditions. Codes and demos are available at https://github.com/ASLP-lab/SenSE.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reducing Linguistic Hallucination in LM-Based Speech Enhancement via Noise-Invariant Acoustic-Semantic Distillation
eess.AS 2026-05 unverdicted novelty 6.0

L3-SE reduces linguistic hallucination in LM-based speech enhancement by distilling noise-invariant acoustic-semantic representations from noisy inputs to condition an autoregressive decoder-only language model.
UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations
eess.AS 2026-04 unverdicted novelty 5.0

UniPASE extends the PASE framework with DeWavLM-Omni to convert degraded speech into high-fidelity, low-hallucination audio across sampling rates via phonetic enhancement, acoustic adaptation, and multi-rate vocoding.