Recognition: unknown
NDF+: Joint Neural Directional Filtering and Diffuse Sound Extraction
Pith reviewed 2026-05-08 03:47 UTC · model grok-4.3
The pith
NDF+ jointly reconstructs virtual directional microphones and extracts diffuse sound to enable control over reverberation effects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NDF+ reformulates VDM estimation into two coupled subtasks of dereverberated VDM reconstruction and diffuse sound extraction. This enables manipulation of diffuse components in the reconstructed VDM output. Under reverberant conditions NDF+ outperforms representative conventional baselines on both subtasks while maintaining VDM reconstruction quality comparable to the original single-task NDF model. In stereo recording applications NDF+ provides controllable inter-channel level differences by adjusting the estimated diffuse component.
What carries the argument
Coupled subtasks of dereverberated virtual directional microphone reconstruction and diffuse sound extraction that together permit independent manipulation of diffuse components in the output.
Load-bearing premise
The neural network can separate directional and diffuse parts of the input signals in reverberant conditions without creating artifacts or forcing a performance trade-off between the two subtasks.
What would settle it
Objective metrics or listening tests on new reverberant recordings where increasing or decreasing the extracted diffuse component produces no measurable change in inter-channel level differences or where overall VDM quality drops below that of the original NDF model.
read the original abstract
Recently, neural directional filtering (NDF) has been introduced as a flexible approach for reconstructing a virtual directional microphone (VDM) with a desired directivity pattern for spatial sound capture. Building on this idea, we propose NDF+, which enables joint neural directional filtering and diffuse sound extraction. NDF+ reformulates VDM estimation into two coupled subtasks: dereverberated VDM reconstruction and diffuse sound extraction. This reformulation enables NDF+ to manipulate diffuse components in the final reconstructed VDM output. We evaluated NDF+ under reverberant conditions and compared it with representative conventional baselines. Results show that NDF+ consistently outperforms the baselines on both subtasks, while maintaining VDM reconstruction quality comparable to that of the original single-task NDF model. These findings indicate that NDF+ introduces an additional degree of freedom for diffuse sound control in the VDM reconstruction. In a stereo recording application, NDF+ provides controllable inter-channel level differences between left and right channels by adjusting the estimated diffuse component.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces NDF+, which extends neural directional filtering to jointly perform dereverberated virtual directional microphone (VDM) reconstruction and diffuse sound extraction. This coupled formulation allows manipulation of diffuse components in the VDM output. Under reverberant conditions, NDF+ is reported to outperform representative conventional baselines on both subtasks while maintaining VDM reconstruction quality comparable to the single-task NDF model. An application to stereo recording is shown where adjusting the estimated diffuse component provides controllable inter-channel level differences.
Significance. If the results hold, NDF+ offers an additional degree of freedom for diffuse sound control in VDM reconstruction, which is valuable for spatial sound capture applications. The approach builds on the original NDF by reformulating the task into coupled subtasks without apparent loss in VDM quality. This could enable more flexible audio processing pipelines.
major comments (2)
- [Abstract] The central claim that 'NDF+ consistently outperforms the baselines on both subtasks, while maintaining VDM reconstruction quality comparable to that of the original single-task NDF model' is not supported by any quantitative metrics, error bars, specific dataset details, or statistical tests in the abstract. This undermines the ability to assess the soundness of the outperformance and comparability assertions.
- [Evaluation under reverberant conditions] No validation is provided that the joint optimization achieves clean separation of directional and diffuse fields without artifacts or trade-offs in reverberant conditions (where the diffuse field is not perfectly isotropic). There is no mention of oracle diffuse references, listening tests for artifacts, or loss-weight ablations to confirm the VDM directivity pattern remains uncompromised.
minor comments (1)
- [Abstract] The acronym VDM is introduced without an initial parenthetical expansion in the abstract, although it is clarified later as virtual directional microphone.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major point below and outline revisions to improve clarity and completeness where feasible.
read point-by-point responses
-
Referee: [Abstract] The central claim that 'NDF+ consistently outperforms the baselines on both subtasks, while maintaining VDM reconstruction quality comparable to that of the original single-task NDF model' is not supported by any quantitative metrics, error bars, specific dataset details, or statistical tests in the abstract. This undermines the ability to assess the soundness of the outperformance and comparability assertions.
Authors: We agree that the abstract, as a concise summary, would benefit from greater specificity to allow readers to better evaluate the claims. The main text provides the supporting quantitative results, dataset details, and comparisons, but we will revise the abstract to incorporate representative performance metrics (e.g., improvements in the relevant error measures for each subtask) and a brief reference to the evaluation conditions. This change will be made without altering the overall length or focus of the abstract. revision: yes
-
Referee: [Evaluation under reverberant conditions] No validation is provided that the joint optimization achieves clean separation of directional and diffuse fields without artifacts or trade-offs in reverberant conditions (where the diffuse field is not perfectly isotropic). There is no mention of oracle diffuse references, listening tests for artifacts, or loss-weight ablations to confirm the VDM directivity pattern remains uncompromised.
Authors: The manuscript demonstrates that VDM reconstruction quality remains comparable to the single-task NDF baseline under the same reverberant conditions, which provides evidence that the joint formulation does not introduce measurable trade-offs in directivity. Objective metrics on both subtasks further support effective separation. We did not employ oracle diffuse references because the method operates in a blind setting without access to ground-truth diffuse fields. No formal listening tests were conducted, as the evaluation prioritized quantitative comparisons with conventional baselines. We will add a short discussion of the isotropic diffuse-field assumption and its limitations in reverberation, along with a loss-weight sensitivity analysis to confirm stability of the VDM pattern. New subjective listening tests, however, would require additional resources and are noted as future work. revision: partial
- New subjective listening tests and access to oracle diffuse references cannot be provided without collecting additional data and conducting new experiments beyond the scope of the current manuscript.
Circularity Check
NDF+ extends prior NDF via joint optimization with no derivation reducing to self-referential inputs
full rationale
The paper proposes a neural architecture reformulating VDM estimation as coupled dereverberated VDM reconstruction and diffuse extraction, then reports empirical outperformance on reverberant test conditions against conventional baselines while preserving single-task VDM quality. No equations, uniqueness theorems, or fitted-parameter renamings are presented that would make the claimed gains equivalent to the training inputs by construction. The central results rest on data-driven evaluation rather than a closed mathematical loop. A citation to the original NDF work appears but is not load-bearing for the new joint-task claims, which are validated independently.
Axiom & Free-Parameter Ledger
free parameters (1)
- Neural network weights and hyperparameters
axioms (1)
- domain assumption Audio signals in reverberant rooms can be decomposed into directional and diffuse components that a neural network can jointly estimate.
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION Fixed beamformer (FBF) with an appropriate directivity pattern en- ables precise spatial rendering of sound sources and preserves key spatial cues, even in multi-source scenarios. However, conventional FBFs, such as differential microphone array (DMA) [1, 2] and su- perdirective beamforming [3], are fundamentally limited by a com- pact array ...
-
[2]
NDF+: Joint Neural Directional Filtering and Diffuse Sound Extraction
PROBLEM FORMULATION We consider a compact array ofQomnidirectional microphones recording an acoustic scene withNsound sources in a reverber- ant room. The array and all sources are assumed to lie in thex- yplane. LetX q,n(f, t)denote the short-time Fourier transform (STFT) coefficient at theq-th microphone due to then-th source, wherefandtdenote the frequ...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[3]
PROPOSED METHOD 3.1. DNN Architecture and Training Loss The FT-JNF framework [14] used for the NDF task [5, 7] employs two distinct long short-term memory (LSTM) networks to esti- mate a single complex-valued mask and applies it to a reference channel to estimate a wanted signal. To accommodate estimates for two distinct targets (Z coh(f, t)andZ diff(f, t...
-
[4]
The reference microphone signal served as the first input channel for the NDF+ model
EXPERIMENTAL SETUP Configurations: A four-microphone array (Q= 4, diameter3 cm) was used, consisting of three microphones arranged in a uniform circular array (UCA) and one positioned at the center as the refer- ence microphone. The reference microphone signal served as the first input channel for the NDF+ model. The target direction of the directivity pa...
-
[5]
The STFT used a 512-point window and a 256-point hop size. Performance measures: We used the signal-to-distortion ratio (SDR) [20] and perceptual evaluation of speech quality (PESQ) [21, 22] to measure distance between estimated signals and target signals. The obtained directivity patterns were estimated using the method described in [7]
-
[6]
Performance Analysis The proposed NDF+ model jointly addresses two explicit subtasks: dereverberated VDM reconstruction ( bZcoh) and diffuse sound ex- traction ( bZdiff)
EXPERIMENTAL RESULTS 5.1. Performance Analysis The proposed NDF+ model jointly addresses two explicit subtasks: dereverberated VDM reconstruction ( bZcoh) and diffuse sound ex- traction ( bZdiff). By achieving both, it implicitly realizes the VDM reconstruction task ( bZvdm) using (7). Table 2 presents the results for variousRT60values. For VDM reconstruc...
-
[7]
NDF+ splits the VDM estimation into dereverberated VDM reconstruction and diffuse sound extrac- tion
CONCLUSIONS We introduce NDF+, a joint framework for neural directional filter- ing and diffuse sound extraction. NDF+ splits the VDM estimation into dereverberated VDM reconstruction and diffuse sound extrac- tion. It consistently outperforms baselines on both tasks and matches the single-task NDF model for VDM reconstruction. Joint optimiza- tion mainta...
-
[8]
The hardware is funded by the German Research Foundation (DFG)
ACKNOWLEDGMENTS The authors gratefully acknowledge the scientific support and HPC resources provided by the Erlangen National High Perfor- mance Computing Center (NHR@FAU) of the Friedrich-Alexander- Universit¨at Erlangen-N ¨urnberg (FAU). The hardware is funded by the German Research Foundation (DFG)
-
[9]
6, Springer Science & Business Media, 2012
Jacob Benesty and Chen Jingdong,Study and design of differ- ential microphone arrays, vol. 6, Springer Science & Business Media, 2012
2012
-
[10]
12, Springer, 2015
Jacob Benesty, Jingdong Chen, and Israel Cohen,Design of circular differential microphone arrays, vol. 12, Springer, 2015
2015
-
[11]
Superdirective microphone arrays,
Joerg Bitzer and K Uwe Simmer, “Superdirective microphone arrays,” inMicrophone arrays: Signal processing techniques and applications, pp. 19–38. Springer, 2001
2001
-
[12]
Fixed beamforming,
Jacob Benesty, Israel Cohen, and Jingdong Chen, “Fixed beamforming,”Fundamentals of Signal Enhancement and Ar- ray Signal Processing, pp. 237–282, 2018
2018
-
[13]
Neural Directional Filtering: Far-field directivity control with a small microphone array,
Julian Wechsler, Srikanth Raj Chetupalli, Mhd Modar Hal- imeh, Oliver Thiergart, and Emanu ¨el A. P. Habets, “Neural Directional Filtering: Far-field directivity control with a small microphone array,” inProc. Intl. Workshop Acoust. Signal En- hancement (IWAENC). IEEE, 2024, pp. 459–463
2024
-
[14]
Steerable neural directional filtering,
Weilong Huang, Mhd Modar Halimeh, Srikanth Raj Chetu- palli, Oliver Thiergart, and Emanu ¨el AP Habets, “Steerable neural directional filtering,” inProc. of the F orum Acusticum Euronoise, European Acoustics Association, 2025
2025
-
[15]
Neural directional filtering using a compact microphone array,
Weilong Huang, Srikanth Raj Chetupalli, Mhd Modar Hal- imeh, Oliver Thiergart, and Emanu ¨el A. P. Habets, “Neural directional filtering using a compact microphone array,”arXiv preprint arXiv:2511.07185, 2025
-
[16]
Neural directional filtering with configurable directivity pattern at inference,
Weilong Huang, Srikanth Raj Chetupalli, and Emanu¨el AP Ha- bets, “Neural directional filtering with configurable directivity pattern at inference,”arXiv preprint arXiv:2510.20253, 2025
-
[17]
Jens Blauert,Spatial hearing: the psychophysics of human sound localization, MIT press, 1997
1997
-
[18]
Source localization in complex listening situations: Selection of binaural cues based on interaural coherence,
Christof Faller and Juha Merimaa, “Source localization in complex listening situations: Selection of binaural cues based on interaural coherence,”The Journal of the Acoustical Society of America, vol. 116, no. 5, pp. 3075–3089, 2004
2004
-
[19]
On multiplicative transfer function approximation in the short-time fourier transform do- main,
Yekutiel Avargel and Israel Cohen, “On multiplicative transfer function approximation in the short-time fourier transform do- main,”IEEE Signal Process. Lett., vol. 14, no. 5, pp. 337–340, 2007
2007
-
[20]
Superdirectional microphone arrays,
Gary W Elko, “Superdirectional microphone arrays,”Acoustic signal processing for telecommunication, pp. 181–237, 2000
2000
-
[21]
John Eargle,The Microphone Book: From mono to stereo to surround-a guide to microphone design and application, Rout- ledge, 2012
2012
-
[22]
Insights into deep non- linear filters for improved multi-channel speech enhancement,
Kristina Tesch and Timo Gerkmann, “Insights into deep non- linear filters for improved multi-channel speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process., 2023
2023
-
[23]
RIR generator,
Emanu ¨el A. P. Habets, “RIR generator,”https://github. com/ehabets/RIR-Generator, 2020, commit 3cf914d
2020
-
[24]
Monte carlo RIR sim- ulation,
Emanu ¨el A. P. Habets, “Monte carlo RIR sim- ulation,”https://github.com/audiolabs/ MonteCarloRIRSimulation, 2026, commit d464a10
2026
-
[25]
LibriSpeech: An ASR corpus based on public domain audio books,
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur, “LibriSpeech: An ASR corpus based on public domain audio books,” inProc. IEEE Intl. Conf. on Acous- tics, Speech and Signal Processing (ICASSP), 2015, pp. 5206– 5210
2015
-
[26]
EARS: An anechoic fullband speech dataset benchmarked for speech enhancement and dereverberation,
Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinjii Watanabe, Alexander Richard, and Timo Gerkmann, “EARS: An anechoic fullband speech dataset benchmarked for speech enhancement and dereverberation,” in Proc. Interspeech Conf., 2024, pp. 4873–4877
2024
-
[27]
Recommendation ITU-R BS.1770-5: Algorithms to measure audio programme loudness and true-peak audio level,
ITU-R, “Recommendation ITU-R BS.1770-5: Algorithms to measure audio programme loudness and true-peak audio level,” 2023
2023
-
[28]
Performance measurement in blind audio source separation,
Emmanuel Vincent, R ´emi Gribonval, and C ´edric F ´evotte, “Performance measurement in blind audio source separation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1462–1469, 2006
2006
-
[29]
PESQ for P.862.2,
Emanu ¨el A. P. Habets Matteo Torcoli, Mhd Modar Hal- imeh, “PESQ for P.862.2,”https://github.com/ audiolabs/PESQ, 2025, commit d11671a
2025
-
[30]
Navigating PESQ: Up-to-date versions and open imple- mentations,
Matteo Torcoli, Mhd Modar Halimeh, and Emanu ¨el A. P. Ha- bets, “Navigating PESQ: Up-to-date versions and open imple- mentations,” inSpeech Communication; 16th ITG Conference. VDE, 2025, pp. 51–55
2025
-
[31]
Adaptive dereverberation of speech signals with speaker-position change detection,
Takuya Yoshioka, Hideyuki Tachibana, Tomohiro Nakatani, and Masato Miyoshi, “Adaptive dereverberation of speech signals with speaker-position change detection,” inProc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2009, pp. 3733–3736
2009
-
[32]
A practical online multichannel dereverberation ap- proach with data-reuse technique,
Weilong Huang, Cheng Xue, Jinwei Feng, and W Bastiaan Kleijn, “A practical online multichannel dereverberation ap- proach with data-reuse technique,” inProc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 501–505
2024
-
[33]
Extracting rever- berant sound using a linearly constrained minimum variance spatial filter,
Oliver Thiergart and Emanu ¨el A. P. Habets, “Extracting rever- berant sound using a linearly constrained minimum variance spatial filter,”IEEE Signal Process. Lett., 2014
2014
-
[34]
DAS generator,
Emanu ¨el A. P. Habets, “DAS generator,”https:// github.com/ehabets/das-generator, 2025, com- mit 6f2cd6d
2025
-
[35]
The stereophonic zoom,
Michael Williams, “The stereophonic zoom,”Rycote Mi- crophone Windshields Ltd and Human Computer Interface, Gloucestershire (UK), 2002
2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.