pith. machine review for the scientific record. sign in

arxiv: 2602.16416 · v2 · submitted 2026-02-18 · 📡 eess.AS · cs.SD

Recognition: no theorem link

Online Single-Channel Audio-Based Sound Speed Estimation for Robust Multi-Channel Audio Control

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:18 UTC · model grok-4.3

classification 📡 eess.AS cs.SD
keywords sound speed estimationonline estimationmultichannel audiospatial audio controlacoustic modelingsingle microphonesound zone controlpropagation compensation
0
0 comments X

The pith

Sound speed can be estimated online from a single microphone by minimizing mismatch between measured audio and a parametric acoustic model during normal multichannel playback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an estimator that tracks changes in sound speed in real time while audio plays through multiple speakers, using only one observation microphone. It works by adjusting a parameter in an acoustic model until the predicted signal best matches what the microphone actually records. This addresses a practical problem: sound speed varies with temperature and other factors, creating mismatches that degrade spatial audio techniques like sound zone control. If the approach holds, systems can compensate automatically without dedicated calibration steps or extra sensors, keeping performance stable during everyday use.

Core claim

An online estimator recovers sound speed from single-channel observations during general multichannel playback by minimizing the structured mismatch between the recorded signal and the output of a parametric propagation model; simulations confirm that the estimates track true speed variations across input types and that feeding them back into a sound zone control algorithm reduces spatial errors compared with using a fixed nominal speed.

What carries the argument

Mismatch minimization between single-channel measurement and a parametric acoustic propagation model, with sound speed as the free parameter.

If this is right

  • The estimator runs continuously during ordinary playback, eliminating separate calibration phases.
  • Tracking remains accurate across different audio signals without requiring special test tones.
  • Using the estimates to correct propagation distances measurably improves sound zone control performance.
  • Only one additional microphone is needed, lowering hardware requirements for robust multi-channel systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Consumer devices with built-in microphones could adapt sound fields in real time to changing room conditions.
  • The single-channel formulation may extend to other acoustic parameters if the model is expanded to include them.
  • Integration into adaptive filters could allow joint estimation of speed and other slowly varying room properties.

Load-bearing premise

The main reason the measured audio differs from the model is variation in sound speed, and the model is accurate enough that one microphone channel yields a unique speed estimate.

What would settle it

Run the estimator on recorded data where room reflections or loudspeaker nonlinearity dominate the error; if the recovered speed deviates systematically from the known true value or multiple speeds produce equally low mismatch, the method fails.

Figures

Figures reproduced from arXiv: 2602.16416 by Andreas Jonas Fuglsig, Jesper Rindom Jensen, Mads Gr{\ae}sb{\o}ll Christensen.

Figure 1
Figure 1. Figure 1: Simulation setup for evaluating online sound speed [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Tracking performance with one active loudspeaker overlaid with spectrogram of non-filtered input signal. Showing the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SZC for VAST ranks V = 1 (a), V = 4041 (b) and V = LJ = 8000 (c) and different input signals (columns). For visibility the vertical axes are scaled according to performance between frames with speed changes. The proposed (blue dashed) is compared to uncorrected filters (red), oracle SICER (green) and GT performance (black). methods. We note that the sharp changes are caused by the simulated step change in … view at source ↗
read the original abstract

Robust spatial audio control relies on accurate acoustic propagation models, yet environmental variations, especially changes in the speed of sound, cause systematic mismatches that degrade performance. Existing methods either assume known sound speed, require multiple microphones, or rely on separate calibration, making them impractical for systems with minimal sensing. We propose an online sound speed estimator that operates during general multichannel audio playback and requires only a single observation microphone. The method exploits the structured effect of sound speed on the reproduced signal and estimates it by minimizing the mismatch between the measured audio and a parametric acoustic model. Simulations show accurate tracking of sound speed for diverse input signals and improved spatial control performance when the estimates are used to compensate propagation errors in a sound zone control framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript proposes an online sound speed estimator for multi-channel audio systems that requires only a single observation microphone during normal playback. It minimizes the mismatch between the measured signal and a parametric acoustic propagation model whose sound-speed parameter is the estimation target, with simulations used to demonstrate accurate tracking across diverse inputs and improved performance when the estimates are fed into a sound-zone control framework.

Significance. If the estimator can be shown to recover sound speed uniquely from single-channel data, the approach would meaningfully reduce hardware requirements for adaptive spatial audio control. The online, signal-agnostic operation is a practical strength, and the simulation-based demonstration of control improvement is a positive indicator. However, the absence of identifiability analysis and quantitative error metrics limits the immediate significance until those elements are strengthened.

major comments (3)
  1. [§3.2] §3.2, Eq. (7)–(9): the mismatch cost is minimized with respect to sound speed alone, yet no identifiability analysis, Hessian evaluation, or sensitivity study is supplied to show that the minimum remains unique when unmodeled effects (reflection coefficients, microphone response, loudspeaker directivity) are present. This directly affects the central claim that a single channel suffices for reliable estimation.
  2. [§4] §4, simulation results: the abstract and results section state “accurate tracking” and “improved control performance,” but no quantitative metrics (RMSE, bias, convergence time, or statistics over multiple trials and SNR conditions) or input-signal specifications are reported, preventing assessment of the robustness asserted in the abstract.
  3. [§5] §5, control framework: the performance gain is shown only for the proposed estimator versus a fixed incorrect sound speed; an ablation against alternative single-channel estimators or joint estimation of additional parameters is missing, so the incremental benefit attributable to the sound-speed estimator cannot be isolated.
minor comments (3)
  1. [§2] The parametric model definition in §2 would benefit from an explicit block diagram showing the signal path from loudspeaker to single microphone.
  2. Notation for the time-of-flight term linear in 1/c should be introduced once and used consistently; occasional reuse of c for both speed and a generic constant is confusing.
  3. Figure captions should list the exact room dimensions, loudspeaker positions, and input-signal class used for each simulation panel.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the manuscript. We address each major point below with specific revisions.

read point-by-point responses
  1. Referee: [§3.2] §3.2, Eq. (7)–(9): the mismatch cost is minimized with respect to sound speed alone, yet no identifiability analysis, Hessian evaluation, or sensitivity study is supplied to show that the minimum remains unique when unmodeled effects (reflection coefficients, microphone response, loudspeaker directivity) are present. This directly affects the central claim that a single channel suffices for reliable estimation.

    Authors: We agree that a formal identifiability analysis is needed to support the central claim. In the revised manuscript, we will add an analysis of the cost function, including evaluation of its Hessian at the minimum and a sensitivity study to unmodeled effects such as reflection coefficients, microphone response, and loudspeaker directivity. This will demonstrate uniqueness under the model's assumptions and quantify robustness to practical mismatches. revision: yes

  2. Referee: [§4] §4, simulation results: the abstract and results section state “accurate tracking” and “improved control performance,” but no quantitative metrics (RMSE, bias, convergence time, or statistics over multiple trials and SNR conditions) or input-signal specifications are reported, preventing assessment of the robustness asserted in the abstract.

    Authors: We acknowledge the absence of quantitative metrics limits assessment of the results. In the revised Section 4, we will report RMSE, bias, convergence times, and statistics across multiple trials and SNR conditions. Input-signal specifications (e.g., signal types, durations, and frequency content) will also be detailed to substantiate the claims of accurate tracking and robustness. revision: yes

  3. Referee: [§5] §5, control framework: the performance gain is shown only for the proposed estimator versus a fixed incorrect sound speed; an ablation against alternative single-channel estimators or joint estimation of additional parameters is missing, so the incremental benefit attributable to the sound-speed estimator cannot be isolated.

    Authors: We agree that isolating the estimator's contribution requires additional comparisons. In the revision, we will include ablations against alternative single-channel estimators where feasible and discuss the scope of joint estimation of other parameters. This will better highlight the incremental benefit of the proposed approach over a fixed sound speed. revision: partial

Circularity Check

0 steps flagged

No circularity: standard parameter estimation from acoustic model mismatch

full rationale

The derivation consists of minimizing a mismatch cost between single-channel observations and a parametric propagation model whose structure (time-of-flight or phase terms linear in 1/c) is taken from established acoustic theory rather than fitted to the target data. The estimate of c is the argmin of that cost; it is not defined in terms of itself, nor is any prediction forced by construction from the inputs. No self-citation chain is invoked to justify uniqueness or the model form, and the abstract reports simulation tracking as external validation. The procedure is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full derivation, model equations, and any fitted quantities are unavailable.

axioms (1)
  • domain assumption Acoustic propagation between loudspeakers and a single observation microphone can be represented by a parametric model whose dominant variable is sound speed.
    Invoked when the abstract states that the estimator minimizes mismatch to a parametric acoustic model.

pith-pipeline@v0.9.0 · 5432 in / 1200 out tokens · 27865 ms · 2026-05-15T21:18:13.046411+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    VAST Approach:The filters are derived by minimizing the weighted mean-squared error between the desired and reproduced signals ξ(q) =q TRBq+µq TRDq−2q TrB +∥d B∥2 2 ,(8) whereµis a weighting parameter,r B =H T BdB, and RB =H BHT B andR D =H DHT D are the spatial covariance matrices corresponding to the BZ and DZ[19]. Using the Variable-Span Trade-off (V A...

  2. [2]

    Temperature Robust Active-Compensated Sound Field Reproduc- tion Using Impulse Response Shaping,

    T. Betlehem, L. Krishnan, and P. Teal, “Temperature Robust Active-Compensated Sound Field Reproduc- tion Using Impulse Response Shaping,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 2018, pp. 481–485

  3. [3]

    On the Influence of Transfer Function Noise on Sound Zone Control in a Room,

    M. B. Møller, J. K. Nielsen, E. Fernandez-Grande, and S. K. Olesen, “On the Influence of Transfer Function Noise on Sound Zone Control in a Room,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 27, no. 9, pp. 1405–1418, 2019

  4. [4]

    Acoustic contrast, planarity and robustness of sound zone methods using a circular loudspeaker array,

    P. Coleman, P. J. B. Jackson, M. Olik, M. Møller, M. Olsen, and J. A. Pedersen, “Acoustic contrast, planarity and robustness of sound zone methods using a circular loudspeaker array,”J. Acoust. Soc. Amer., vol. 135, no. 4, pp. 1929–1940, Apr. 2014

  5. [5]

    Robust Fixed- Filter Sound Zone Control with Audio-Based Position Tracking,

    S. S. Bhattacharjee, A. J. Fuglsig, F. Christensen, J. R. Jensen, and M. Græsbøll Christensen, “Robust Fixed- Filter Sound Zone Control with Audio-Based Position Tracking,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 2025, pp. 1–5

  6. [6]

    Sound Speed Perturbation Robust Audio: Im- pulse Response Correction and Sound Zone Con- trol,

    S. S. Bhattacharjee, J. R. Jensen, and M. G. Chris- tensen, “Sound Speed Perturbation Robust Audio: Im- pulse Response Correction and Sound Zone Con- trol,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 33, pp. 2008–2020, 2025

  7. [7]

    The effect of atmospheric conditions on sound propagation and its impact on the outdoor sound field control,

    D. Caviedes Nozal, F. M. Heuchel, F. T. Agerkvist, and J. Brunskog, “The effect of atmospheric conditions on sound propagation and its impact on the outdoor sound field control,” inINTER-NOISE and NOISE-CON Congress and Conference Proceedings, vol. 259, 2019, pp. 7211–7220

  8. [8]

    Large-scale out- door sound field control,

    F. M. Heuchel, D. Caviedes-Nozal, J. Brunskog, F. T. Agerkvist, and E. Fernandez-Grande, “Large-scale out- door sound field control,”J. Acoust. Soc. Amer., vol. 148, no. 4, pp. 2392–2402, Oct. 2020

  9. [9]

    Spatial covariance estimation for sound field reproduction us- ing kernel ridge regression,

    J. Brunnstrom, M. B. Møller, J. Østergaard, T. van Waterschoot, M. Moonen, and F. Elvander, “Spatial covariance estimation for sound field reproduction us- ing kernel ridge regression,” inProc. European Signal Process. Conf., Sep. 2025

  10. [10]

    CGMM-Based Sound Zone Gen- eration Using Robust Pressure Matching With ATF Per- turbation Constraints,

    J. Zhang, L. Shi, M. G. Christensen, W. Zhang, L. Zhang, and J. Chen, “CGMM-Based Sound Zone Gen- eration Using Robust Pressure Matching With ATF Per- turbation Constraints,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 31, pp. 3331–3345, 2023

  11. [11]

    Sound zone control with fixed acoustic contrast and simultaneous tracking of acoustic transfer functions,

    M. Hu, L. Shi, H. Zou, M. G. Christensen, and J. Lu, “Sound zone control with fixed acoustic contrast and simultaneous tracking of acoustic transfer functions,”J. Acoust. Soc. Amer., vol. 153, no. 5, p. 2538, May 2023

  12. [12]

    An Alternating Mode Strategy for Adaptive Sound Field Control and Acoustic Path Tracking,

    J. Zhang, J. Xie, D. Shi, W. Zhang, J. Chen, and J. Benesty, “An Alternating Mode Strategy for Adaptive Sound Field Control and Acoustic Path Tracking,” in Asia Pacific Signal and Inf. Process. Asso. Annu. Sum- mit and Conf., Oct. 2025, p. 422

  13. [13]

    TDOA-Based Speed of Sound Estimation for Air Tem- perature and Room Geometry Inference,

    P. Annibale, J. Filos, P. A. Naylor, and R. Rabenstein, “TDOA-Based Speed of Sound Estimation for Air Tem- perature and Room Geometry Inference,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 21, no. 2, pp. 234–246, Feb. 2013

  14. [14]

    3D position sensing using the differences in the time-of-flights from a wave source to various receivers,

    A. Mahajan and M. Walworth, “3D position sensing using the differences in the time-of-flights from a wave source to various receivers,”IEEE Trans. Robot. Autom., vol. 17, no. 1, pp. 91–94, Feb. 2001

  15. [15]

    Simultaneous Localization of a Mobile Robot and Multiple Sound Sources Using a Microphone Array,

    J.-S. Hu, C.-Y . Chan, C.-K. Wang, M.-T. Lee, and C.-Y . Kuo, “Simultaneous Localization of a Mobile Robot and Multiple Sound Sources Using a Microphone Array,” Adv. Robotics, vol. 25, no. 1-2, pp. 135–152, Jan. 2011

  16. [16]

    A review of the state- of-the-art approaches in detecting time-of-flight in room impulse responses,

    C. Othmani, N. S. Dokhanchi, S. Merchel, A. V ogel, M. E. Altinsoy, and C. V oelker, “A review of the state- of-the-art approaches in detecting time-of-flight in room impulse responses,”Sensors and Actuators A: Physical, vol. 374, Aug. 2024

  17. [17]

    Passive acoustic measurements of wind velocity and sound speed in air,

    O. A. Godin, V . G. Irisov, and M. I. Charnotskii, “Passive acoustic measurements of wind velocity and sound speed in air,”J. Acoust. Soc. Amer., vol. 135, no. 2, EL68–EL74, Jan. 2014

  18. [18]

    The direct esti- mation of sound speed using pulse–echo ultrasound,

    M. E. Anderson and G. E. Trahey, “The direct esti- mation of sound speed using pulse–echo ultrasound,” J. Acoust. Soc. Amer., vol. 104, no. 5, pp. 3099–3106, Nov. 1998

  19. [19]

    Time domain optimization of filters used in a loudspeaker array for personal audio,

    M. F. S. G ´alvez, S. J. Elliott, and J. Cheer, “Time domain optimization of filters used in a loudspeaker array for personal audio,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 23, no. 11, pp. 1869–1878, 2015

  20. [20]

    A unified approach to generating sound zones using variable span linear filters,

    T. Lee, J. K. Nielsen, J. R. Jensen, and M. G. Chris- tensen, “A unified approach to generating sound zones using variable span linear filters,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2018, pp. 491– 495

  21. [21]

    M. G. Christensen,Introduction to Audio Processing. Cham: Springer International Publishing, 2019

  22. [22]

    Fast Generation of Sound Zones Using Variable Span Trade-Off Filters in the DFT-Domain,

    T. Lee, L. Shi, J. K. Nielsen, and M. G. Christensen, “Fast Generation of Sound Zones Using Variable Span Trade-Off Filters in the DFT-Domain,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 29, pp. 363– 378, 2021

  23. [23]

    E. A. Habets,Room impulse response generator, Oct. 2020

  24. [24]

    EARS: An anechoic fullband speech dataset benchmarked for speech enhancement and dere- verberation,

    J. Richter et al., “EARS: An anechoic fullband speech dataset benchmarked for speech enhancement and dere- verberation,” inISCA Interspeech, 2024, pp. 4873– 4877

  25. [25]

    MUSAN: A Music, Speech, and Noise Corpus

    D. Snyder, G. Chen, and D. Povey,MUSAN: A Music, Speech, and Noise Corpus, arXiv:1510.08484v1, 2015