Dual-Granularity Orthogonal Disentanglement for Generalizable Audio Deepfake Detection

· 2026 · cs.SD · arXiv 2606.16532

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Audio deepfake detectors often fail to generalize across speakers, as they learn speaker-identity features rather than synthesis artifacts, known as implicit identity leakage. Existing methods address this but incur architectural complexity or training instability. This paper proposes a dual-granularity orthogonal disentanglement framework enforcing feature independence at two levels: sample-level cosine orthogonality captures directional decorrelation, while batch-level cross-covariance regularization eliminates linear correlations across embedding dimensions. A curriculum disentanglement schedule progressively strengthens the orthogonality constraint without auxiliary networks or adversarial dynamics. Experiments on ASVspoof 2019 LA, ASVspoof 2021 DF, and In-the-Wild datasets demonstrate that the proposed method achieves 1.35%, 7.88%, and 21.58% equal error rates (EER), respectively, surpassing gradient reversal disentanglement by 2.60% absolute on cross-dataset transfer.

representative citing papers

Dual-Granularity Orthogonal Disentanglement for Generalizable Audio Deepfake Detection

cs.SD · 2026-06-15 · unverdicted · novelty 4.0

Dual-granularity orthogonal disentanglement framework achieves EERs of 1.35%, 7.88%, and 21.58% on ASVspoof 2019 LA, ASVspoof 2021 DF, and In-the-Wild datasets, outperforming gradient reversal by 2.60% on cross-dataset transfer.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Dual-Granularity Orthogonal Disentanglement for Generalizable Audio Deepfake Detection cs.SD · 2026-06-15 · unverdicted · none · ref 1 · internal anchor
Dual-granularity orthogonal disentanglement framework achieves EERs of 1.35%, 7.88%, and 21.58% on ASVspoof 2019 LA, ASVspoof 2021 DF, and In-the-Wild datasets, outperforming gradient reversal by 2.60% on cross-dataset transfer.

Dual-Granularity Orthogonal Disentanglement for Generalizable Audio Deepfake Detection

fields

years

verdicts

representative citing papers

citing papers explorer