pith. sign in

arxiv: 2606.04210 · v2 · pith:KKQVNCSCnew · submitted 2026-06-02 · 📡 eess.AS · cs.LG· cs.SD

Representation Matters in Randomized Smoothing for Audio Classification

Pith reviewed 2026-06-30 10:52 UTC · model grok-4.3

classification 📡 eess.AS cs.LGcs.SD
keywords randomized smoothingaudio classificationrobustness certificationpreprocessing pipelinecertified radiuslog-mel featuresperturbation model
0
0 comments X

The pith

Randomized smoothing for audio classification is under-specified without explicit choice of certified object and preprocessing policy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that randomized smoothing adds noise in a vector space that audio pipelines do not uniquely define because of normalization, range control, and transforms into log-mel or spectral features. On keyword spotting and environmental-sound tasks, the authors compare smoothing applied directly to waveforms, to extracted features, and after post-processing. Their diagnostics demonstrate that identical noise levels produce different effective perturbation scales and certified accuracies depending on where the noise is added and how the signal is scaled. The central recommendation is that studies must choose and report the task-specific certified object together with the perturbation model, location, gain policy, raw radius, and any geometry changes after the noise.

Core claim

Direct RS is therefore under-specified unless the certified object and preprocessing policy are explicit. On two audio benchmarks, waveform, feature-space, and post-processed smoothing yield different results: at sigma=0.0025 the datasets share the same median raw radius but different SNR-equivalent scales; log-mel smoothing gives higher positive-radius certified accuracy on environmental sounds; and clipping or peak normalization changes the effective perturbation norm by roughly 230-351 times.

What carries the argument

Explicit comparison of smoothing locations (waveform, log-mel feature space, post-processed) together with the reported quantities of raw radius, SNR-equivalent scale, and certified accuracy under each policy.

If this is right

  • At the same smoothing level, datasets with different waveform energies produce different SNR-equivalent scales even when raw radii match.
  • Log-mel smoothing can certify more examples with nonzero radius on environmental sounds, but the guarantee applies over features rather than raw waveforms.
  • Clipping or peak normalization alters the effective perturbation norm by factors of 230 to 351.
  • Meaningful comparison across audio RS papers requires reporting perturbation location, gain policy, raw radius, and post-noise geometry changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same representation ambiguity likely appears in any domain whose standard pipelines include normalization or feature extraction before the classifier.
  • Future certification frameworks could treat the entire preprocessing pipeline as part of the certified object rather than an external detail.

Load-bearing premise

The preprocessing pipeline is fixed and known when the smoothing noise is added, so that a radius certified in one space can be interpreted as a meaningful guarantee in the original audio domain.

What would settle it

A controlled test in which the identical smoothing sigma is applied to the same audio files but under two different normalizations or feature transforms, producing materially different certified radii or accuracies that cannot be reconciled without knowing the pipeline.

read the original abstract

Randomized smoothing (RS) certifies robustness in the vector space where Gaussian noise is added. In audio classification, this space is often not uniquely defined as standard pipelines normalize, range-control, and transform waveforms into log-mel or other spectral features. We show that direct RS is therefore under-specified unless the certified object and preprocessing policy are explicit. On two audio benchmarks, keyword spotting and environmental-sound classification, we study waveform, feature-space, and post-processed smoothing. Our diagnostics show why representation-aware reporting is necessary: at the same smoothing level $\sigma=0.0025$, the two datasets share the same median raw radius $.007996$, but different waveform energies yield different SNR-equivalent scales ($83.98$ vs. $90.97$ dB); log-mel smoothing gives higher positive-radius certified accuracy on environmental sounds ($68.42\%$ vs. $65.53\%$), certifying more examples with nonzero radius but over features rather than waveforms; and clipping or peak normalization changes the effective perturbation norm by roughly $230$--$351\times$. We therefore recommend that audio RS studies choose and report the task-specific certified object and perturbation model, including the perturbation location, gain policy, raw radius, and any post-noise geometry changes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript claims that randomized smoothing (RS) for audio classification is under-specified unless the certified object (waveform versus feature space) and preprocessing policy (normalization, clipping, log-mel transform) are made explicit. It supports the claim via empirical diagnostics on keyword spotting and environmental-sound classification benchmarks, showing that identical σ=0.0025 yields the same median raw radius of 0.007996 but dataset-dependent SNR values (83.98 dB vs. 90.97 dB), that log-mel smoothing produces different positive-radius certified accuracies (68.42% vs. 65.53%), and that clipping/peak normalization alters effective perturbation norms by factors of 230–351×.

Significance. If the reported differences hold, the work is significant for the audio robustness literature because it supplies concrete, reproducible evidence that radius interpretation depends on the chosen representation and policy. The diagnostics directly quantify the practical consequences of under-specification and motivate the recommended reporting elements (perturbation location, gain policy, raw radius, post-noise geometry).

minor comments (3)
  1. [Abstract] Abstract: the median raw radius is written as .007996; adopt consistent scientific notation or four-decimal precision to match the SNR values.
  2. [Abstract] The manuscript refers to “two audio benchmarks” and “dataset-dependent” SNR without naming the exact corpora or splits in the abstract; add the dataset identifiers for immediate clarity.
  3. Ensure that the experimental section supplies the precise model architectures, training hyperparameters, and number of Monte-Carlo samples used for certification so that the reported accuracy and radius differences can be reproduced.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review and recommendation to accept. The summary accurately reflects our central claim that randomized smoothing in audio requires explicit specification of the certified representation and preprocessing policy.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper advances a methodological claim that direct randomized smoothing on audio is under-specified without explicit choice of certified object (waveform vs. feature space) and preprocessing policy, supported entirely by empirical diagnostics: identical σ yields identical median raw radius but dataset-dependent SNR due to waveform energy; log-mel smoothing alters certified accuracy (68.42% vs 65.53%); clipping changes effective perturbation norm by 230–351×. These observations are direct measurements on two benchmarks and do not reduce to any fitted parameter renamed as prediction, self-definitional equation, or load-bearing self-citation. The argument is diagnostic and self-contained against external audio pipeline benchmarks, with no derivation chain that collapses to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard randomized-smoothing guarantee that certified radius in the noise-addition space implies robustness, together with the domain assumption that audio pipelines contain non-invertible or norm-altering steps whose effects must be modeled explicitly.

axioms (2)
  • standard math Gaussian noise addition yields a certified radius in the space where the noise is added
    Core property of randomized smoothing invoked throughout the abstract.
  • domain assumption Audio preprocessing (normalization, log-mel transform, clipping) alters the effective perturbation norm between waveform and feature spaces
    Invoked when the abstract states that clipping changes the perturbation norm by 230-351x.

pith-pipeline@v0.9.1-grok · 5771 in / 1344 out tokens · 48348 ms · 2026-06-30T10:52:27.027693+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references

  1. [1]

    2019 , volume=

    Cohen, Jeremy and Rosenfeld, Elan and Kolter, Zico , booktitle=. 2019 , volume=

  2. [2]

    2019 , organization=

    Lecuyer, Mathias and Atlidakis, Vaggelis and Geambasu, Roxana and Hsu, Daniel and Jana, Suman , booktitle=. 2019 , organization=

  3. [3]

    Li, Bai and Chen, Changyou and Wang, Wenlin and Carin, Lawrence , booktitle=

  4. [4]

    Salman, Hadi and Li, Jerry and Razenshteyn, Ilya and Zhang, Pengchuan and Zhang, Huan and Bubeck, Sebastien and Yang, Greg , booktitle=

  5. [5]

    Dvijotham, Krishnamurthy and Hayes, Jamie and Balle, Borja and Kolter, Zico and Qin, Chongli and Gyorgy, Andras and Xiao, Kai and Gowal, Sven and Kohli, Pushmeet , booktitle=

  6. [6]

    and Sojoudi, Somayeh , journal=

    Pfrommer, Samuel and Anderson, Brendon G. and Sojoudi, Somayeh , journal=

  7. [7]

    Warden, Pete , journal=

  8. [8]

    , booktitle=

    Piczak, Karol J. , booktitle=. 2015 , organization=

  9. [9]

    and Ellis, Daniel P

    Gemmeke, Jort F. and Ellis, Daniel P. W. and Freedman, Dylan and Jansen, Aren and Lawrence, Wade and Moore, R. Channing and Plakal, Manoj and Ritter, Marvin , booktitle=. 2017 , organization=

  10. [10]

    Hershey, Shawn and Chaudhuri, Sourish and Ellis, Daniel P. W. and Gemmeke, Jort F. and Jansen, Aren and Moore, R. Channing and Plakal, Manoj and Platt, Devin and Saurous, Rif A. and Seybold, Bryan and Slaney, Malcolm and Weiss, Ron J. and Wilson, Kevin , booktitle=. 2017 , organization=

  11. [11]

    , journal=

    Kong, Qiuqiang and Cao, Yin and Iqbal, Turab and Wang, Yuxuan and Wang, Wenwu and Plumbley, Mark D. , journal=

  12. [12]

    Gong, Yuan and Chung, Yu-An and Glass, James , booktitle=

  13. [13]

    Zeghidour, Neil and Teboul, Olivier and de Chaumont Quitry, Felix and Tagliasacchi, Marco , booktitle=

  14. [14]

    Lostanlen, Vincent and Salamon, Justin and Cartwright, Mark and McFee, Brian and Farnsworth, Andrew and Kelling, Steve and Bello, Juan Pablo , journal=

  15. [15]

    Snyder, David and Chen, Guoguo and Povey, Daniel , journal=

  16. [16]

    and Khudanpur, Sanjeev , booktitle=

    Ko, Tom and Peddinti, Vijayaditya and Povey, Daniel and Seltzer, Michael L. and Khudanpur, Sanjeev , booktitle=. 2017 , organization=

  17. [17]

    and Chan, William and Zhang, Yu and Chiu, Chung-Cheng and Zoph, Barret and Cubuk, Ekin D

    Park, Daniel S. and Chan, William and Zhang, Yu and Chiu, Chung-Cheng and Zoph, Barret and Cubuk, Ekin D. and Le, Quoc V. , booktitle=

  18. [18]

    Carlini, Nicholas and Mishra, Pratyush and Vaidya, Tavish and Zhang, Yuankai and Sherr, Micah and Shields, Clay and Wagner, David and Zhou, Wenchao , booktitle=

  19. [19]

    2018 , organization=

    Carlini, Nicholas and Wagner, David , booktitle=. 2018 , organization=

  20. [20]

    2019 , volume=

    Qin, Yao and Carlini, Nicholas and Cottrell, Garrison and Goodfellow, Ian and Raffel, Colin , booktitle=. 2019 , volume=

  21. [21]

    , booktitle=

    Yuan, Xuejing and Chen, Yuxuan and Zhao, Yue and Long, Yunhui and Liu, Xiaokuan and Chen, Kai and Zhang, Shengzhi and Huang, Heqing and Wang, Xiaofeng and Gunter, Carl A. , booktitle=

  22. [22]

    2021 , doi=

    Olivier, Raphael and Raj, Bhiksha , booktitle=. 2021 , doi=

  23. [23]

    and Oseledets, Ivan , booktitle=

    Korzh, Dmitrii and Karimov, Elvir and Pautov, Mikhail and Rogov, Oleg Y. and Oseledets, Ivan , booktitle=. 2025 , doi=

  24. [24]

    Purohit, Harsh and Tanabe, Ryo and Ichige, Kenji and Endo, Takashi and Nikaido, Yuki and Suefusa, Kaori and Kawaguchi, Yohei , booktitle=

  25. [25]

    Nishida, Tomoya and Harada, Noboru and Niizumi, Daisuke and Albertini, Davide and Sannino, Roberto and Pradolini, Simone and Augusti, Filippo and Imoto, Keisuke and Dohi, Kota and Purohit, Harsh and Endo, Takashi and Kawaguchi, Yohei , journal=

  26. [26]

    and Pearson, Egon S

    Clopper, Charles J. and Pearson, Egon S. , journal=