Insights into deep non-linear filters for improved multi-channel speech enhancement,

· 2023 · DOI 10.1109/taslp

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

open at publisher browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

cs.GR · 2026-01-29 · unverdicted · novelty 7.0

JUST-DUB-IT adapts a joint audio-visual diffusion model via LoRA to generate high-quality dubbed videos with translated audio and lip-synced facial motion.

GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking

cs.SD · 2026-04-10 · unverdicted · novelty 6.0

GRM ranks Mel bands by attack contribution versus utility sensitivity, perturbs a subset, and learns a universal perturbation to reach 88.46% average jailbreak success rate with improved attack-utility trade-off on four audio LLMs.

EchoAvatar: Real-time Generative Avatar Animation from Audio Streams

cs.CV · 2026-05-27 · unverdicted · novelty 5.0

EchoAvatar presents a streaming architecture for low-latency full-body animation from incremental audio, with RL refinement and LLM tool-call control, outperforming real-time baselines.

MOSS-Audio Technical Report

cs.SD · 2026-06-01 · unverdicted · novelty 4.0

MOSS-Audio is an audio-language model using a 12.5 Hz encoder, DeepStack cross-layer injection, time markers, and an event-preserving annotation pipeline for unified audio understanding.

RADAR Challenge 2026: Robust Audio Deepfake Recognition under Media Transformations

eess.AS · 2026-05-10 · unverdicted · novelty 4.0 · 3 refs

RADAR Challenge 2026 organizes a multilingual audio deepfake detection benchmark with media transformations, reporting participation from 33 development and 22 evaluation teams using EER metric.

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

cs.CL · 2026-01-08 · unverdicted · novelty 4.0

Qwen3-VL-Embedding-8B achieves state-of-the-art performance with a 77.8 overall score on the MMEB-V2 multimodal embedding benchmark.

Towards the Anonymization of the Language Modeling

cs.CL · 2025-01-05 · unverdicted · novelty 4.0

Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.

The Master-Slave Encoder Model for Improving Patent Text Summarization: A New Approach to Combining Specifications and Claims

cs.CL · 2024-11-21 · unverdicted · novelty 4.0

MSEA uses a master-slave encoder architecture on patent specifications and claims, enhanced with pointer networks and repetition suppression, to generate better summaries as measured by small ROUGE score gains.

Spatial Speech Perception Systems: A Survey of Sound Source Localization, Directional Enhancement, and Speech Recognition

eess.AS · 2026-07-02 · unverdicted · novelty 2.0

A survey of spatial speech perception systems covering sound source localization, directional enhancement, and automatic speech recognition methods and their integration.

citing papers explorer

Showing 9 of 9 citing papers.

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion cs.GR · 2026-01-29 · unverdicted · none · ref 15
JUST-DUB-IT adapts a joint audio-visual diffusion model via LoRA to generate high-quality dubbed videos with translated audio and lip-synced facial motion.
GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking cs.SD · 2026-04-10 · unverdicted · none · ref 12
GRM ranks Mel bands by attack contribution versus utility sensitivity, perturbs a subset, and learns a universal perturbation to reach 88.46% average jailbreak success rate with improved attack-utility trade-off on four audio LLMs.
EchoAvatar: Real-time Generative Avatar Animation from Audio Streams cs.CV · 2026-05-27 · unverdicted · none · ref 2
EchoAvatar presents a streaming architecture for low-latency full-body animation from incremental audio, with RL refinement and LLM tool-call control, outperforming real-time baselines.
MOSS-Audio Technical Report cs.SD · 2026-06-01 · unverdicted · none · ref 4
MOSS-Audio is an audio-language model using a 12.5 Hz encoder, DeepStack cross-layer injection, time markers, and an event-preserving annotation pipeline for unified audio understanding.
RADAR Challenge 2026: Robust Audio Deepfake Recognition under Media Transformations eess.AS · 2026-05-10 · unverdicted · none · ref 2 · 3 links
RADAR Challenge 2026 organizes a multilingual audio deepfake detection benchmark with media transformations, reporting participation from 33 development and 22 evaluation teams using EER metric.
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking cs.CL · 2026-01-08 · unverdicted · none · ref 9
Qwen3-VL-Embedding-8B achieves state-of-the-art performance with a 77.8 overall score on the MMEB-V2 multimodal embedding benchmark.
Towards the Anonymization of the Language Modeling cs.CL · 2025-01-05 · unverdicted · none · ref 38
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
The Master-Slave Encoder Model for Improving Patent Text Summarization: A New Approach to Combining Specifications and Claims cs.CL · 2024-11-21 · unverdicted · none · ref 144
MSEA uses a master-slave encoder architecture on patent specifications and claims, enhanced with pointer networks and repetition suppression, to generate better summaries as measured by small ROUGE score gains.
Spatial Speech Perception Systems: A Survey of Sound Source Localization, Directional Enhancement, and Speech Recognition eess.AS · 2026-07-02 · unverdicted · none · ref 66
A survey of spatial speech perception systems covering sound source localization, directional enhancement, and automatic speech recognition methods and their integration.

Insights into deep non-linear filters for improved multi-channel speech enhancement,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer