SelectTSL is an end-to-end model using a Prompt-Guided Selective Attention Module and IPD enhancer to localize only prompt-specified target sounds and estimate their count and direction in complex acoustic scenes.
The in- terspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results.arXiv preprint arXiv:2005.13981
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
PATSE is a DOA-guided target speaker extraction system that produces speaker-attributed streams for diarization-free ASR in multi-party conversations.
SenSE adds language-model semantic guidance to flow-matching generative speech enhancement via a dual-path masked conditioning strategy and reports SOTA results on distorted speech.
Fast-ULCNet matches original ULCNet speech enhancement quality while cutting model size by more than half and latency by 34% via FastGRNN replacement and a state-drift filter.
citing papers explorer
-
SelectTSL: Prompt-Guided Selective Target Sound Localization in Complex Scenarios
SelectTSL is an end-to-end model using a Prompt-Guided Selective Attention Module and IPD enhancer to localize only prompt-specified target sounds and estimate their count and direction in complex acoustic scenes.
-
Position-Aware Target Speaker Extraction for Long-Form Multi-Party Conversations: A Diarization-Free Framework for ASR
PATSE is a DOA-guided target speaker extraction system that produces speaker-attributed streams for diarization-free ASR in multi-party conversations.
-
Fast-ULCNet: A fast and ultra low complexity network for single-channel speech enhancement
Fast-ULCNet matches original ULCNet speech enhancement quality while cutting model size by more than half and latency by 34% via FastGRNN replacement and a state-drift filter.