arxiv: 2605.07241 · v1 · submitted 2026-05-08 · 💻 cs.CR · eess.AS

Recognition: 2 theorem links

· Lean Theorem

Asymmetric Phase Coding Audio Watermarking

Guang Yang , Amir Ghasemian , Ninareh Mehrabi , Homa Hosseinmardi

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:53 UTC · model grok-4.3

classification 💻 cs.CR eess.AS

keywords audio watermarkingdigital signaturesSTFT phase codingquantization index modulationblind extractionaudio provenanceEd25519Reed-Solomon

0 comments

The pith

Phase-coded audio watermark verifies at 98% after attacks

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Asymmetric Phase Coding (APC) as a training-free cryptographic signing layer for audio signals. It integrates Ed25519 digital signatures with Reed-Solomon error correction, pseudo-random short-time Fourier transform phase-bin selection, and quantization-index modulation on log-magnitude differences. This produces a compact watermark that supports blind extraction and cryptographic verification. A reader would care because it offers a way to establish audio provenance that resists deepfake generation and real-world distortions without needing to train models. The evaluation shows it maintains high verification success across diverse attack scenarios with good perceptual quality and fast processing.

Core claim

APC combines Ed25519 digital signatures (64-byte) with Reed-Solomon error correction, pseudo-random STFT phase-bin selection, and a redundant quantization-index-modulation code on log-magnitude differences of adjacent bin pairs, yielding a compact, non-repudiable, blind-extractable watermark that verifies at 97.5 to 98.3 percent on 1000 LibriSpeech clips under eight attack configurations at mean PESQ of 3.02.

What carries the argument

The asymmetric phase coding process using pseudo-random STFT phase-bin selection combined with QIM on adjacent log-magnitude pairs to embed and extract the signature.

If this is right

The watermark supports blind extraction without access to the original audio.
Verification rates remain high (97.5-98.3%) under cropping, low-pass filtering, resampling, and re-encoding attacks.
Computational cost is low at tens of milliseconds per clip on CPU.
Audio quality is preserved with average PESQ score of 3.02.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could be combined with other watermarking methods to create more resilient systems against both known and unknown attacks.
Applying similar phase coding techniques to video or image signals might extend the provenance protection to other media types.
Key management practices such as regular updates to the bin selection seed could further strengthen resistance to potential adaptive attackers.

Load-bearing premise

The pseudo-random STFT phase-bin selection and QIM encoding on log-magnitude pairs remain both imperceptible and extractable under the eight real-world attack configurations when the attacker lacks knowledge of the exact bin selection key.

What would settle it

An adaptive white-box attack that targets the specific phase bins and magnitude pairs to erase the watermark, resulting in verification rates below 90% on the test clips without severely degrading audio quality.

Figures

Figures reproduced from arXiv: 2605.07241 by Amir Ghasemian, Guang Yang, Homa Hosseinmardi, Ninareh Mehrabi.

**Figure 1.** Figure 1: Hybrid APC pipeline. Embedding: a 49-byte message is signed with Ed25519 and Reed– Solomon encoded (t=30); the same payload is written in parallel through (i) a phase channel that maps each bit to ±π/2 at a pseudo-random STFT bin set K = H(Kpub), and (ii) a magnitude-QIM channel that quantises adjacent log-magnitude pairs (∆=1.0 nat) at a disjoint bin set M = H(Kpub)⊕MA; both are merged via ISTFT. Extracti… view at source ↗

**Figure 2.** Figure 2: Spectrogram comparison of original (a) and watermarked (b) audio. Phase-only modifica [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: BER and NC (mean±std) across the eight attacks. All conditions retain NC above 0.96 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: White-box erasure trade-off: verification only collapses at [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Cryptographic verification rate per attack. Hybrid APC verifies between [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Log-spectral distance (LSD, lower is better) across the eight attacks. Resampling-16k [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Per-sample BER heatmap (subset of LibriSpeech [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: NC survivability radar across key codec/channel attacks. Every axis is at or above [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

read the original abstract

The proliferation of deepfake audio challenges voice-based authentication systems; passive forensic detectors are sensitive to evolving generative models and to real-world channel distortions. We propose Asymmetric Phase Coding (APC), a training-free cryptographic signing layer for audio, designed as a compact and auditable provenance primitive that can stand alone or be stacked with learned watermarks. APC combines Ed25519 digital signatures (EdDSA, FIPS 186-5; 64-byte signatures) with Reed-Solomon error correction, pseudo-random STFT phase-bin selection, and a redundant quantization-index-modulation (QIM) code on log-magnitude differences of adjacent bin pairs, yielding a compact, non-repudiable, blind-extractable watermark. We evaluate APC on 1,000 LibriSpeech test-clean clips (10 s each, 44.1 kHz) under eight attack configurations -- identity, 10% end-cropping, 20% end-cropping, 8 kHz low-pass, 16 kHz round-trip resampling, FLAC re-encoding, MP3 at 128 kbps, and OGG-Vorbis at 128 kbps -- and achieve cryptographic verification rates between 97.5% and 98.3% on every condition at mean PESQ=3.02 and tens-of-milliseconds CPU latency. We explicitly compare APC against recent neural baselines (AudioSeal, WavMark, SilentCipher), detail the threat model (forgery resistance vs. erasure), characterize the dataset, define all metrics, quantify an adaptive white-box erasure attack, and release code, keys, and metadata for reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

APC gives a clean, reproducible way to embed Ed25519 signatures into audio via phase coding and QIM, with solid numbers on standard attacks and full code release.

read the letter

The paper's main contribution is a training-free audio watermark called APC that signs content with Ed25519, adds Reed-Solomon protection, then hides the bits by picking pseudo-random STFT phase bins and applying QIM to log-magnitude differences between adjacent bins. It reports 97.5-98.3% verification success on 1,000 LibriSpeech clips under eight attacks (cropping, resampling, low-pass, FLAC, MP3, OGG) at mean PESQ 3.02 and low CPU cost, plus direct comparison to AudioSeal, WavMark, and SilentCipher, plus an explicit adaptive white-box erasure test. Code, keys, and metadata are released, which is the strongest part of the work because it lets anyone rerun the exact pipeline on the same data.

Referee Report

1 major / 2 minor

Summary. The paper proposes Asymmetric Phase Coding (APC), a training-free cryptographic audio watermarking method that embeds compact Ed25519 signatures (protected by Reed-Solomon) into audio via keyed pseudo-random STFT phase-bin selection and redundant QIM on log-magnitude differences of adjacent bins. It evaluates the scheme on 1,000 LibriSpeech test-clean clips (10 s, 44.1 kHz) under eight attack configurations (identity, cropping, low-pass, resampling, re-encoding, MP3/OGG), reporting cryptographic verification rates of 97.5–98.3 % at mean PESQ 3.02 with low CPU latency, while comparing against neural baselines (AudioSeal, WavMark, SilentCipher), detailing the threat model (forgery vs. erasure), and releasing code, keys, and metadata.

Significance. If the reported performance holds under the stated conditions, APC offers a reproducible, auditable provenance primitive with cryptographic non-repudiation that can stand alone or layer with learned detectors against deepfake audio. The explicit release of implementation artifacts, the coherent use of established primitives (Ed25519, Reed-Solomon, STFT, QIM), and the inclusion of an adaptive white-box erasure quantification are notable strengths that support direct reproduction and extension.

major comments (1)

[Evaluation] Evaluation section: verification rates are reported as point estimates (97.5–98.3 %) across 1,000 clips without error bars, standard deviations, or binomial confidence intervals; this weakens the claim of consistent performance under every attack condition and should be addressed with statistical quantification.

minor comments (2)

[Abstract and Evaluation] Abstract and evaluation: the mean PESQ value of 3.02 is given without per-attack breakdown or variance; adding these would clarify imperceptibility trade-offs.
[Method] Implementation details: while code is released, the manuscript should explicitly state the chosen QIM step size and number of redundant pairs per bit (listed as free parameters) to allow readers to assess sensitivity without inspecting the repository.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for the constructive comment on the evaluation section. We address the point below and will revise the paper accordingly.

read point-by-point responses

Referee: [Evaluation] Evaluation section: verification rates are reported as point estimates (97.5–98.3 %) across 1,000 clips without error bars, standard deviations, or binomial confidence intervals; this weakens the claim of consistent performance under every attack condition and should be addressed with statistical quantification.

Authors: We agree that statistical quantification would strengthen the claims of consistent performance. In the revised manuscript we will add binomial confidence intervals (Clopper-Pearson) for the verification rates under each of the eight attack conditions. We will also report the standard deviation of the per-clip verification outcomes to quantify consistency across the 1,000 LibriSpeech clips. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper builds APC from independent, externally established primitives (Ed25519 signatures per FIPS 186-5, Reed-Solomon codes, STFT phase-bin selection, and QIM on log-magnitude differences) and evaluates them empirically on external LibriSpeech data under eight explicitly listed attacks. No derivation step reduces by construction to fitted parameters, self-referential definitions, or load-bearing self-citations; the verification rates and PESQ scores are direct experimental outputs rather than renamed inputs. The construction and threat model remain independent of the reported results.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; full parameter choices, exact QIM step sizes, and bin-selection seed details are not visible, limiting precise ledger construction.

free parameters (2)

QIM quantization step size
Must be chosen to balance robustness and imperceptibility; value not stated in abstract.
Number of redundant QIM pairs per signature bit
Determines error-correction strength; not quantified in abstract.

axioms (2)

standard math Ed25519 provides unforgeable signatures under standard cryptographic assumptions
Invoked as the signing primitive without further proof.
domain assumption STFT phase modifications via QIM on adjacent bins remain perceptually transparent at the chosen parameters
Required for the PESQ claim but not derived.

pith-pipeline@v0.9.0 · 5599 in / 1445 out tokens · 41618 ms · 2026-05-11T01:53:35.815627+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/DimensionForcing.lean reality_from_one_distinction (8-tick period) echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

For each group of G=8 consecutive frames and each k∈K, the offset Δϕ[n]=ϕdata[n]−ϕ(i0,k) ... ϕ′(i,k)=ϕ(i,k)+Δϕ[n], i∈g.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J-cost) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

QIM encoding... step Δ=1.0 nat... σn=−cos(π(ℓ1−ℓ2)/Δ)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, R

Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, R. J. Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu. Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. InProc. IEEE International Conference on Acoustics, Speech and Signal Process...

work page doi:10.1109/icassp.2018.8461368 2018
[2]

Soong, and Tie-Yan Liu

Xu Tan, Tao Qin, Frank Soong, and Tie-Yan Liu. A survey on neural speech synthesis.arXiv preprint arXiv:2106.15561, 2021. URLhttps://arxiv.org/abs/2106.15561

work page arXiv 2021
[3]

Does Audio Deepfake Detection Generalize?

Nicolas M. Müller, Philip Czempin, Thorsten Holz, and Konstantin Böttinger. Does audio deepfake detection generalize?arXiv preprint arXiv:2203.16263, 2022

work page arXiv 2022
[4]

Zahra Khanjani, Gabrielle Watson, and Vandana P. Janeja. Audio deepfakes: A survey.Frontiers in Big Data, 5:1001063, 2022. doi: 10.3389/fdata.2022.1001063

work page doi:10.3389/fdata.2022.1001063 2022
[5]

Fooled Twice: People Cannot Detect Deepfakes but Think They Can

Jiangyan Yi, Chenglong Wang, Jianhua Tao, Xiaohui Zhang, Chu Yuan Zhang, and Yan Zhao. Audio deepfake detection: A survey.arXiv preprint arXiv:2308.14970, 2023. URL https://arxiv.org/abs/ 2308.14970

work page arXiv 2023
[6]

Audio anti-spoofing detection: A survey.arXiv preprint arXiv:2404.13914, 2024

Menglu Li, Yasaman Ahmadiadli, and Xiao-Ping Zhang. Audio anti-spoofing detection: A survey.arXiv preprint arXiv:2404.13914, 2024. URLhttps://arxiv.org/abs/2404.13914

work page arXiv 2024
[7]

AudioMarkBench: Benchmarking robustness of audio watermarking

Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, and Neil Zhenqiang Gong. AudioMarkBench: Benchmarking robustness of audio watermarking. InAdvances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2024. URLhttps://arxiv.org/abs/2406.06979

work page arXiv 2024
[8]

Swanson, Mei Kobayashi, and Ahmed H

Mitchell D. Swanson, Mei Kobayashi, and Ahmed H. Tewfik. Multimedia data-embedding and watermark- ing technologies.Proceedings of the IEEE, 86(6):1064–1087, 1998. doi: 10.1109/5.687830

work page doi:10.1109/5.687830 1998
[9]

Josefsson and I

S. Josefsson and I. Liusvaara. Edwards-curve digital signature algorithm (EdDSA). IETF RFC 8032, 2017. URLhttps://www.rfc-editor.org/rfc/rfc8032

work page 2017
[10]

Digital signature standard (dss)

NIST. Digital signature standard (dss). Fips 186-5, National Institute of Standards and Technology, 2023

work page 2023
[11]

C2pa implementation guidance

C2PA. C2pa implementation guidance. Coalition for Content Provenance and Authenticity, 2024. URL https://c2pa.org/specifications/specifications/2.1/guidance/Guidance.html

work page 2024
[12]

Reed and Gustave Solomon

Irving S. Reed and Gustave Solomon. Polynomial codes over certain finite fields.Journal of the Society for Industrial and Applied Mathematics, 8(2):300–304, 1960. doi: 10.1137/0108018

work page doi:10.1137/0108018 1960
[13]

Costello.Error Control Coding

Shu Lin and Daniel J. Costello.Error Control Coding. Pearson Prentice Hall, 2nd edition, 2004

work page 2004
[14]

Librispeech: An ASR corpus based on public domain audio books

Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: An ASR corpus based on public domain audio books. InProc. IEEE ICASSP, pages 5206–5210, 2015. doi: 10.1109/ ICASSP.2015.7178964

work page arXiv 2015
[15]

Proactive detection of voice cloning with localized watermarking

Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, and Hady Elsahar. Proactive detection of voice cloning with localized watermarking. InProc. ICML, 2024

work page 2024
[16]

Wavmark: Watermarking for audio generation,

Guangyu Chen, Yu Wu, Shujie Liu, Tao Liu, Xiaoyong Du, and Furu Wei. WavMark: Watermarking for audio generation.arXiv preprint arXiv:2308.12770, 2023

work page arXiv 2023
[17]

SilentCipher: Deep audio watermarking

Mayank Kumar Singh, Naoya Takahashi, Weihsiang Liao, and Yuki Mitsufuji. SilentCipher: Deep audio watermarking. InProc. INTERSPEECH, 2024

work page 2024
[18]

C2pa technical specification: Content provenance and authenticity

C2PA. C2pa technical specification: Content provenance and authenticity. Coalition for Content Provenance and Authenticity, 2024. URLhttps://spec.c2pa.org/. Accessed: 2024

work page 2024
[19]

Cai technical architecture

Content Authenticity Initiative. Cai technical architecture. Content Authenticity Initiative, 2024. URL https://contentauthenticity.org

work page 2024
[20]

ASVspoof 2021: Accelerating progress in spoofed and deepfake speech detection

Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, and Héctor Delgado. ASVspoof 2021: Accelerating progress in spoofed and deepfake speech detection. InProc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, pages...

work page 2021
[21]

Echo hiding

Daniel Gruhl, Walter Bender, and Anthony Lu. Echo hiding. InInformation Hiding, First International Workshop, LNCS, volume 1174, pages 295–315. Springer, 1996

work page 1996
[22]

I. J. Cox, J. Kilian, T. Leighton, and T. Shamoon. Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing, 6(12):1673–1687, 1997. doi: 10.1109/83.650120

work page doi:10.1109/83.650120 1997
[23]

Techniques for data hiding.IBM Systems Journal, 35(3.4):313–336, 1996

Walter Bender, Daniel Gruhl, Norishige Morimoto, and Anthony Lu. Techniques for data hiding.IBM Systems Journal, 35(3.4):313–336, 1996

work page 1996
[24]

A novel steganalysis algorithm of phase coding in audio signal

Wei Zeng, Haojun Ai, and Ruimin Hu. A novel steganalysis algorithm of phase coding in audio signal. InSixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007), pages 261–264, 2007. doi: 10.1109/ALPIT.2007.41

work page doi:10.1109/alpit.2007.41 2007
[25]

Robust and reliable audio watermarking based on dynamic phase coding and error control coding

Nhut Minh Ngo, Brian Michael Kurkoski, and Masashi Unoki. Robust and reliable audio watermarking based on dynamic phase coding and error control coding. InProc. EUSIPCO, pages 1616–1620, 2015. doi: 10.1109/EUSIPCO.2015.7362790

work page doi:10.1109/eusipco.2015.7362790 2015
[26]

Janakiraman, M

N. Janakiraman, M. S. Samuel, M. R. Sumalatha, and T. John. The adaptive multi-level phase coding method in audio steganography. In2019 IEEE 5th International Conference for Convergence in Technology (I2CT), pages 1–5, 2019. doi: 10.1109/I2CT45611.2019.8830467

work page doi:10.1109/i2ct45611.2019.8830467 2019
[27]

Developing real-time streaming transformer transducer for speech recognition on large-scale dataset

Shengbei Wang, Weitao Yuan, Zhen Zhang, Jianming Wang, and Masashi Unoki. Synchronous multi-bit audio watermarking based on phase shifting. InProc. IEEE ICASSP, pages 2675–2679, 2021. doi: 10.1109/ICASSP39728.2021.9414307

work page doi:10.1109/icassp39728.2021.9414307 2021
[28]

Detecting voice cloning attacks via timbre watermarking

Chang Liu, Jie Zhang, Tianwei Zhang, Xi Yang, Weiming Zhang, and Nenghai Yu. Detecting voice cloning attacks via timbre watermarking. InProc. Network and Distributed System Security Symposium (NDSS). Internet Society, 2024

work page 2024
[29]

Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang

Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang. High-speed high-security signatures.Journal of Cryptographic Engineering, 2(2):77–89, 2012. doi: 10.1007/s13389-012-0027-1

work page doi:10.1007/s13389-012-0027-1 2012
[30]

P.862.2: Wideband extension to recommendation p.862 for the assessment of wideband telephone networks and speech codecs

ITU-T. P.862.2: Wideband extension to recommendation p.862 for the assessment of wideband telephone networks and speech codecs. ITU-T Recommendation, 2007. PESQ wideband

work page 2007
[31]

Taal, Richard C

Cees H. Taal, Richard C. Hendriks, Richard Heusdens, and Jesper Jensen. A short-time objective intelligibility measure for time-frequency weighted noisy speech. InProc. IEEE ICASSP, pages 4214– 4217, 2010. doi: 10.1109/ICASSP.2010.5495701. 11 A Supplementary figures This appendix collects four supporting visualisations whose underlying numbers are already...

work page doi:10.1109/icassp.2010.5495701 2010