arxiv: 2602.16416 · v2 · submitted 2026-02-18 · 📡 eess.AS · cs.SD

Recognition: no theorem link

Online Single-Channel Audio-Based Sound Speed Estimation for Robust Multi-Channel Audio Control

Andreas Jonas Fuglsig , Mads Gr{\ae}sb{\o}ll Christensen , Jesper Rindom Jensen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:18 UTC · model grok-4.3

classification 📡 eess.AS cs.SD

keywords sound speed estimationonline estimationmultichannel audiospatial audio controlacoustic modelingsingle microphonesound zone controlpropagation compensation

0 comments

The pith

Sound speed can be estimated online from a single microphone by minimizing mismatch between measured audio and a parametric acoustic model during normal multichannel playback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an estimator that tracks changes in sound speed in real time while audio plays through multiple speakers, using only one observation microphone. It works by adjusting a parameter in an acoustic model until the predicted signal best matches what the microphone actually records. This addresses a practical problem: sound speed varies with temperature and other factors, creating mismatches that degrade spatial audio techniques like sound zone control. If the approach holds, systems can compensate automatically without dedicated calibration steps or extra sensors, keeping performance stable during everyday use.

Core claim

An online estimator recovers sound speed from single-channel observations during general multichannel playback by minimizing the structured mismatch between the recorded signal and the output of a parametric propagation model; simulations confirm that the estimates track true speed variations across input types and that feeding them back into a sound zone control algorithm reduces spatial errors compared with using a fixed nominal speed.

What carries the argument

Mismatch minimization between single-channel measurement and a parametric acoustic propagation model, with sound speed as the free parameter.

If this is right

The estimator runs continuously during ordinary playback, eliminating separate calibration phases.
Tracking remains accurate across different audio signals without requiring special test tones.
Using the estimates to correct propagation distances measurably improves sound zone control performance.
Only one additional microphone is needed, lowering hardware requirements for robust multi-channel systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Consumer devices with built-in microphones could adapt sound fields in real time to changing room conditions.
The single-channel formulation may extend to other acoustic parameters if the model is expanded to include them.
Integration into adaptive filters could allow joint estimation of speed and other slowly varying room properties.

Load-bearing premise

The main reason the measured audio differs from the model is variation in sound speed, and the model is accurate enough that one microphone channel yields a unique speed estimate.

What would settle it

Run the estimator on recorded data where room reflections or loudspeaker nonlinearity dominate the error; if the recovered speed deviates systematically from the known true value or multiple speeds produce equally low mismatch, the method fails.

Figures

Figures reproduced from arXiv: 2602.16416 by Andreas Jonas Fuglsig, Jesper Rindom Jensen, Mads Gr{\ae}sb{\o}ll Christensen.

**Figure 2.** Figure 2: Tracking performance with one active loudspeaker overlaid with spectrogram of non-filtered input signal. Showing the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: SZC for VAST ranks V = 1 (a), V = 4041 (b) and V = LJ = 8000 (c) and different input signals (columns). For visibility the vertical axes are scaled according to performance between frames with speed changes. The proposed (blue dashed) is compared to uncorrected filters (red), oracle SICER (green) and GT performance (black). methods. We note that the sharp changes are caused by the simulated step change in … view at source ↗

read the original abstract

Robust spatial audio control relies on accurate acoustic propagation models, yet environmental variations, especially changes in the speed of sound, cause systematic mismatches that degrade performance. Existing methods either assume known sound speed, require multiple microphones, or rely on separate calibration, making them impractical for systems with minimal sensing. We propose an online sound speed estimator that operates during general multichannel audio playback and requires only a single observation microphone. The method exploits the structured effect of sound speed on the reproduced signal and estimates it by minimizing the mismatch between the measured audio and a parametric acoustic model. Simulations show accurate tracking of sound speed for diverse input signals and improved spatial control performance when the estimates are used to compensate propagation errors in a sound zone control framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable single-mic online estimator for sound speed during normal playback, but the simulations leave the uniqueness question open.

read the letter

The core contribution is an online estimator that tracks sound speed from one observation microphone while multi-channel audio is playing. It does this by minimizing mismatch between the measured signal and a parametric propagation model whose sound-speed term is the unknown. The simulations reportedly show stable tracking across input types and better sound-zone control once the estimate is plugged back into the renderer. That combination—single channel, no separate calibration, running on ordinary program material—is the part that stands out from the usual multi-mic or offline approaches. The framing around environmental drift in spatial audio is also straightforward and useful for anyone who has to keep a control system accurate without adding hardware. The method itself looks like a direct application of existing acoustic models rather than a new derivation, which keeps the novelty focused on the online single-channel setting. The main weakness is that the abstract and stress-test note give no quantitative error figures, no description of the input signals or room conditions used in the simulations, and no check on whether other unmodeled effects (reflections, mic response, loudspeaker directivity) can be absorbed into the sound-speed estimate. With only one observation, the cost surface is under-determined by construction, so the claim that the minimizer recovers the true speed rests on an assumption that the parametric model captures the dominant mismatch. The paper would be stronger with an identifiability argument or at least an ablation that injects realistic model errors. Readers working on practical spatial audio in cars, rooms, or consumer devices will find the idea immediately relevant; the rest of the field can safely skip it. I would send it to review because the practical motivation is clear and the simulations, whatever their limitations, show the approach is at least implementable.

Referee Report

3 major / 3 minor

Summary. The manuscript proposes an online sound speed estimator for multi-channel audio systems that requires only a single observation microphone during normal playback. It minimizes the mismatch between the measured signal and a parametric acoustic propagation model whose sound-speed parameter is the estimation target, with simulations used to demonstrate accurate tracking across diverse inputs and improved performance when the estimates are fed into a sound-zone control framework.

Significance. If the estimator can be shown to recover sound speed uniquely from single-channel data, the approach would meaningfully reduce hardware requirements for adaptive spatial audio control. The online, signal-agnostic operation is a practical strength, and the simulation-based demonstration of control improvement is a positive indicator. However, the absence of identifiability analysis and quantitative error metrics limits the immediate significance until those elements are strengthened.

major comments (3)

[§3.2] §3.2, Eq. (7)–(9): the mismatch cost is minimized with respect to sound speed alone, yet no identifiability analysis, Hessian evaluation, or sensitivity study is supplied to show that the minimum remains unique when unmodeled effects (reflection coefficients, microphone response, loudspeaker directivity) are present. This directly affects the central claim that a single channel suffices for reliable estimation.
[§4] §4, simulation results: the abstract and results section state “accurate tracking” and “improved control performance,” but no quantitative metrics (RMSE, bias, convergence time, or statistics over multiple trials and SNR conditions) or input-signal specifications are reported, preventing assessment of the robustness asserted in the abstract.
[§5] §5, control framework: the performance gain is shown only for the proposed estimator versus a fixed incorrect sound speed; an ablation against alternative single-channel estimators or joint estimation of additional parameters is missing, so the incremental benefit attributable to the sound-speed estimator cannot be isolated.

minor comments (3)

[§2] The parametric model definition in §2 would benefit from an explicit block diagram showing the signal path from loudspeaker to single microphone.
Notation for the time-of-flight term linear in 1/c should be introduced once and used consistently; occasional reuse of c for both speed and a generic constant is confusing.
Figure captions should list the exact room dimensions, loudspeaker positions, and input-signal class used for each simulation panel.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the manuscript. We address each major point below with specific revisions.

read point-by-point responses

Referee: [§3.2] §3.2, Eq. (7)–(9): the mismatch cost is minimized with respect to sound speed alone, yet no identifiability analysis, Hessian evaluation, or sensitivity study is supplied to show that the minimum remains unique when unmodeled effects (reflection coefficients, microphone response, loudspeaker directivity) are present. This directly affects the central claim that a single channel suffices for reliable estimation.

Authors: We agree that a formal identifiability analysis is needed to support the central claim. In the revised manuscript, we will add an analysis of the cost function, including evaluation of its Hessian at the minimum and a sensitivity study to unmodeled effects such as reflection coefficients, microphone response, and loudspeaker directivity. This will demonstrate uniqueness under the model's assumptions and quantify robustness to practical mismatches. revision: yes
Referee: [§4] §4, simulation results: the abstract and results section state “accurate tracking” and “improved control performance,” but no quantitative metrics (RMSE, bias, convergence time, or statistics over multiple trials and SNR conditions) or input-signal specifications are reported, preventing assessment of the robustness asserted in the abstract.

Authors: We acknowledge the absence of quantitative metrics limits assessment of the results. In the revised Section 4, we will report RMSE, bias, convergence times, and statistics across multiple trials and SNR conditions. Input-signal specifications (e.g., signal types, durations, and frequency content) will also be detailed to substantiate the claims of accurate tracking and robustness. revision: yes
Referee: [§5] §5, control framework: the performance gain is shown only for the proposed estimator versus a fixed incorrect sound speed; an ablation against alternative single-channel estimators or joint estimation of additional parameters is missing, so the incremental benefit attributable to the sound-speed estimator cannot be isolated.

Authors: We agree that isolating the estimator's contribution requires additional comparisons. In the revision, we will include ablations against alternative single-channel estimators where feasible and discuss the scope of joint estimation of other parameters. This will better highlight the incremental benefit of the proposed approach over a fixed sound speed. revision: partial

Circularity Check

0 steps flagged

No circularity: standard parameter estimation from acoustic model mismatch

full rationale

The derivation consists of minimizing a mismatch cost between single-channel observations and a parametric propagation model whose structure (time-of-flight or phase terms linear in 1/c) is taken from established acoustic theory rather than fitted to the target data. The estimate of c is the argmin of that cost; it is not defined in terms of itself, nor is any prediction forced by construction from the inputs. No self-citation chain is invoked to justify uniqueness or the model form, and the abstract reports simulation tracking as external validation. The procedure is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full derivation, model equations, and any fitted quantities are unavailable.

axioms (1)

domain assumption Acoustic propagation between loudspeakers and a single observation microphone can be represented by a parametric model whose dominant variable is sound speed.
Invoked when the abstract states that the estimator minimizes mismatch to a parametric acoustic model.

pith-pipeline@v0.9.0 · 5432 in / 1200 out tokens · 27865 ms · 2026-05-15T21:18:13.046411+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

[1]

VAST Approach:The filters are derived by minimizing the weighted mean-squared error between the desired and reproduced signals ξ(q) =q TRBq+µq TRDq−2q TrB +∥d B∥2 2 ,(8) whereµis a weighting parameter,r B =H T BdB, and RB =H BHT B andR D =H DHT D are the spatial covariance matrices corresponding to the BZ and DZ[19]. Using the Variable-Span Trade-off (V A...

work page
[2]

Temperature Robust Active-Compensated Sound Field Reproduc- tion Using Impulse Response Shaping,

T. Betlehem, L. Krishnan, and P. Teal, “Temperature Robust Active-Compensated Sound Field Reproduc- tion Using Impulse Response Shaping,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 2018, pp. 481–485

work page 2018
[3]

On the Influence of Transfer Function Noise on Sound Zone Control in a Room,

M. B. Møller, J. K. Nielsen, E. Fernandez-Grande, and S. K. Olesen, “On the Influence of Transfer Function Noise on Sound Zone Control in a Room,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 27, no. 9, pp. 1405–1418, 2019

work page 2019
[4]

Acoustic contrast, planarity and robustness of sound zone methods using a circular loudspeaker array,

P. Coleman, P. J. B. Jackson, M. Olik, M. Møller, M. Olsen, and J. A. Pedersen, “Acoustic contrast, planarity and robustness of sound zone methods using a circular loudspeaker array,”J. Acoust. Soc. Amer., vol. 135, no. 4, pp. 1929–1940, Apr. 2014

work page 1929
[5]

Robust Fixed- Filter Sound Zone Control with Audio-Based Position Tracking,

S. S. Bhattacharjee, A. J. Fuglsig, F. Christensen, J. R. Jensen, and M. Græsbøll Christensen, “Robust Fixed- Filter Sound Zone Control with Audio-Based Position Tracking,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 2025, pp. 1–5

work page 2025
[6]

Sound Speed Perturbation Robust Audio: Im- pulse Response Correction and Sound Zone Con- trol,

S. S. Bhattacharjee, J. R. Jensen, and M. G. Chris- tensen, “Sound Speed Perturbation Robust Audio: Im- pulse Response Correction and Sound Zone Con- trol,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 33, pp. 2008–2020, 2025

work page 2008
[7]

The effect of atmospheric conditions on sound propagation and its impact on the outdoor sound field control,

D. Caviedes Nozal, F. M. Heuchel, F. T. Agerkvist, and J. Brunskog, “The effect of atmospheric conditions on sound propagation and its impact on the outdoor sound field control,” inINTER-NOISE and NOISE-CON Congress and Conference Proceedings, vol. 259, 2019, pp. 7211–7220

work page 2019
[8]

Large-scale out- door sound field control,

F. M. Heuchel, D. Caviedes-Nozal, J. Brunskog, F. T. Agerkvist, and E. Fernandez-Grande, “Large-scale out- door sound field control,”J. Acoust. Soc. Amer., vol. 148, no. 4, pp. 2392–2402, Oct. 2020

work page 2020
[9]

Spatial covariance estimation for sound field reproduction us- ing kernel ridge regression,

J. Brunnstrom, M. B. Møller, J. Østergaard, T. van Waterschoot, M. Moonen, and F. Elvander, “Spatial covariance estimation for sound field reproduction us- ing kernel ridge regression,” inProc. European Signal Process. Conf., Sep. 2025

work page 2025
[10]

CGMM-Based Sound Zone Gen- eration Using Robust Pressure Matching With ATF Per- turbation Constraints,

J. Zhang, L. Shi, M. G. Christensen, W. Zhang, L. Zhang, and J. Chen, “CGMM-Based Sound Zone Gen- eration Using Robust Pressure Matching With ATF Per- turbation Constraints,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 31, pp. 3331–3345, 2023

work page 2023
[11]

Sound zone control with fixed acoustic contrast and simultaneous tracking of acoustic transfer functions,

M. Hu, L. Shi, H. Zou, M. G. Christensen, and J. Lu, “Sound zone control with fixed acoustic contrast and simultaneous tracking of acoustic transfer functions,”J. Acoust. Soc. Amer., vol. 153, no. 5, p. 2538, May 2023

work page 2023
[12]

An Alternating Mode Strategy for Adaptive Sound Field Control and Acoustic Path Tracking,

J. Zhang, J. Xie, D. Shi, W. Zhang, J. Chen, and J. Benesty, “An Alternating Mode Strategy for Adaptive Sound Field Control and Acoustic Path Tracking,” in Asia Pacific Signal and Inf. Process. Asso. Annu. Sum- mit and Conf., Oct. 2025, p. 422

work page 2025
[13]

TDOA-Based Speed of Sound Estimation for Air Tem- perature and Room Geometry Inference,

P. Annibale, J. Filos, P. A. Naylor, and R. Rabenstein, “TDOA-Based Speed of Sound Estimation for Air Tem- perature and Room Geometry Inference,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 21, no. 2, pp. 234–246, Feb. 2013

work page 2013
[14]

3D position sensing using the differences in the time-of-flights from a wave source to various receivers,

A. Mahajan and M. Walworth, “3D position sensing using the differences in the time-of-flights from a wave source to various receivers,”IEEE Trans. Robot. Autom., vol. 17, no. 1, pp. 91–94, Feb. 2001

work page 2001
[15]

Simultaneous Localization of a Mobile Robot and Multiple Sound Sources Using a Microphone Array,

J.-S. Hu, C.-Y . Chan, C.-K. Wang, M.-T. Lee, and C.-Y . Kuo, “Simultaneous Localization of a Mobile Robot and Multiple Sound Sources Using a Microphone Array,” Adv. Robotics, vol. 25, no. 1-2, pp. 135–152, Jan. 2011

work page 2011
[16]

A review of the state- of-the-art approaches in detecting time-of-flight in room impulse responses,

C. Othmani, N. S. Dokhanchi, S. Merchel, A. V ogel, M. E. Altinsoy, and C. V oelker, “A review of the state- of-the-art approaches in detecting time-of-flight in room impulse responses,”Sensors and Actuators A: Physical, vol. 374, Aug. 2024

work page 2024
[17]

Passive acoustic measurements of wind velocity and sound speed in air,

O. A. Godin, V . G. Irisov, and M. I. Charnotskii, “Passive acoustic measurements of wind velocity and sound speed in air,”J. Acoust. Soc. Amer., vol. 135, no. 2, EL68–EL74, Jan. 2014

work page 2014
[18]

The direct esti- mation of sound speed using pulse–echo ultrasound,

M. E. Anderson and G. E. Trahey, “The direct esti- mation of sound speed using pulse–echo ultrasound,” J. Acoust. Soc. Amer., vol. 104, no. 5, pp. 3099–3106, Nov. 1998

work page 1998
[19]

Time domain optimization of filters used in a loudspeaker array for personal audio,

M. F. S. G ´alvez, S. J. Elliott, and J. Cheer, “Time domain optimization of filters used in a loudspeaker array for personal audio,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 23, no. 11, pp. 1869–1878, 2015

work page 2015
[20]

A unified approach to generating sound zones using variable span linear filters,

T. Lee, J. K. Nielsen, J. R. Jensen, and M. G. Chris- tensen, “A unified approach to generating sound zones using variable span linear filters,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., 2018, pp. 491– 495

work page 2018
[21]

M. G. Christensen,Introduction to Audio Processing. Cham: Springer International Publishing, 2019

work page 2019
[22]

Fast Generation of Sound Zones Using Variable Span Trade-Off Filters in the DFT-Domain,

T. Lee, L. Shi, J. K. Nielsen, and M. G. Christensen, “Fast Generation of Sound Zones Using Variable Span Trade-Off Filters in the DFT-Domain,”IEEE/ACM Trans. Audio, Speech Lang. Process., vol. 29, pp. 363– 378, 2021

work page 2021
[23]

E. A. Habets,Room impulse response generator, Oct. 2020

work page 2020
[24]

EARS: An anechoic fullband speech dataset benchmarked for speech enhancement and dere- verberation,

J. Richter et al., “EARS: An anechoic fullband speech dataset benchmarked for speech enhancement and dere- verberation,” inISCA Interspeech, 2024, pp. 4873– 4877

work page 2024
[25]

MUSAN: A Music, Speech, and Noise Corpus

D. Snyder, G. Chen, and D. Povey,MUSAN: A Music, Speech, and Noise Corpus, arXiv:1510.08484v1, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015