{"total":15,"items":[{"citing_arxiv_id":"2606.30791","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Probing-Guided Layer Selection from Self-Supervised Speech Models for Generalizable Audio Deepfake Detection","primary_cat":"cs.SD","submitted_at":"2026-06-29T18:19:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Probing-guided selection of depth zones from frozen SSL speech models yields compact classifiers with 28% relative EER improvement on cross-domain deepfake detection tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23201","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MixFake: Benchmarking and Enhancing Audio Deepfake Detection in Diverse Real-world Mixed Audio","primary_cat":"cs.SD","submitted_at":"2026-05-22T03:33:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MixFake is a new benchmark for mixed-authenticity audio and a multi-stream prompt tuning method achieves 0.95% EER foreground and 7.72% absolute gain in complex background deepfake detection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20266","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook","primary_cat":"cs.SD","submitted_at":"2026-05-18T20:21:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"of the underlying security threats and safety mechanisms. Although an earlier review has addressed trustworthiness in speech [37], they precede the recent shift toward unified generative frameworks, focusing largely on traditional ma- chine learning. And specialized surveys remain predomi- nantly concentrated on singular issues such as the detection of deepfakes and biometric authentication [38]-[40]. A com- parison with these existing audio surveys is provided in Table 1, illustrating the lack of literature dedicated to the implications of trustworthiness of these models. TABLE 1 Comparison with existing surveys. Survey Obj.‡ Trustworthiness† Stage⋆ O H P F S R A D P F D E Year 2022 Feng et al. [37] S✗✓ ✓✗✓✗✓✗ ✗ ✗✓ ✓ Year 2023 Latif et al."},{"citing_arxiv_id":"2605.07241","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Asymmetric Phase Coding Audio Watermarking","primary_cat":"cs.CR","submitted_at":"2026-05-08T04:54:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"URLhttps://arxiv.org/abs/2106.15561. [3] Nicolas M. Müller, Philip Czempin, Thorsten Holz, and Konstantin Böttinger. Does audio deepfake detection generalize?arXiv preprint arXiv:2203.16263, 2022. [4] Zahra Khanjani, Gabrielle Watson, and Vandana P. Janeja. Audio deepfakes: A survey.Frontiers in Big Data, 5:1001063, 2022. doi: 10.3389/fdata.2022.1001063. [5] Jiangyan Yi, Chenglong Wang, Jianhua Tao, Xiaohui Zhang, Chu Yuan Zhang, and Yan Zhao. Audio deepfake detection: A survey.arXiv preprint arXiv:2308.14970, 2023. URL https://arxiv.org/abs/ 2308.14970. [6] Menglu Li, Yasaman Ahmadiadli, and Xiao-Ping Zhang. Audio anti-spoofing detection: A survey.arXiv preprint arXiv:2404.13914, 2024. URLhttps://arxiv."},{"citing_arxiv_id":"2605.03079","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Phoneme-Level Deepfake Detection Across Emotional Conditions Using Self-Supervised Embeddings","primary_cat":"cs.SD","submitted_at":"2026-05-04T18:49:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Phoneme-level analysis using self-supervised embeddings identifies higher divergence in complex vowels and fricatives for emotional voice conversion deepfakes, enabling more interpretable detection across emotions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02223","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Fine-Grained Speech Inpainting Forensics:A Dataset, Method, and Metric for Multi-Region Tampering Localization","primary_cat":"cs.SD","submitted_at":"2026-05-04T04:54:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A new dataset, iterative coarse-to-fine localization framework, and segment-level IoU F1 metric tackle the open problem of detecting multiple unknown word-level inpainted regions in speech.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19949","ref_index":80,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages","primary_cat":"eess.AS","submitted_at":"2026-04-21T19:54:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16254","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics","primary_cat":"cs.SD","submitted_at":"2026-04-17T17:14:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ArtifactNet extracts codec residuals from spectrograms with a 4M-parameter network to detect AI music at F1=0.9829 and 1.49% FPR on unseen tracks from 22 generators, outperforming larger baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13400","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Classical Machine Learning Baselines for Deepfake Audio Detection on the Fake-or-Real Dataset","primary_cat":"eess.AS","submitted_at":"2026-04-15T01:59:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"RBF SVM achieves ~93% accuracy and ~7% EER on deepfake audio detection using prosodic and spectral features from the FoR dataset at 44.1 kHz and 16 kHz sampling rates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12650","ref_index":46,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis","primary_cat":"cs.CV","submitted_at":"2026-04-14T12:20:07+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.02913","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Split and Conquer Partial Deepfake Speech","primary_cat":"cs.SD","submitted_at":"2026-04-03T09:33:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A two-stage boundary detection plus segment classification method with multi-length training achieves state-of-the-art results for detecting and localizing partial deepfakes on PartialSpoof and Half-Truth benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"threaten security-critical systems such as automatic speaker verification (ASV) and voice- based authentication. In response, a substantial body of work on speech anti-spoofing and audio deepfake detection has emerged, supported and systematized by the ASVspoof challenge series [1, 2, 3] and has been reviewed in depth in recent surveys on audio deep- fake detection and anti-spoofing [4, 5]. However, the majority of existing countermeasures ∗Corresponding author Email address:inbalri@post.bgu.ac.il(Inbal Rimon) arXiv:2604.02913v1 [cs.SD] 3 Apr 2026 assume that each utterance is either fully bona fide or fully spoofed and therefore operate at the utterance level. A more realistic and increasingly emphasized threat model is partial manipulation,"},{"citing_arxiv_id":"2604.04951","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Synthetic Trust Attacks: Modeling How Generative AI Manipulates Human Decisions in Social Engineering Fraud","primary_cat":"cs.CR","submitted_at":"2026-04-02T23:09:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The paper proposes Synthetic Trust Attacks (STAs) as a formal threat model with an eight-stage attack chain (STAM) that shifts defense focus from detecting synthetic media to protecting human decision processes in social engineering.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.09007","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Gender Fairness in Audio Deepfake Detection: Performance and Disparity Analysis","primary_cat":"cs.SD","submitted_at":"2026-03-09T22:52:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Fairness metrics uncover gender disparities in audio deepfake detection error distributions that standard Equal Error Rate metrics obscure.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.24674","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing","primary_cat":"eess.AS","submitted_at":"2025-09-29T12:14:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A zero-shot open-set speech deepfake source tracing framework using adapted SSL-AASIST embeddings and AAM loss achieves EER of 16.43% in OOD trials with cosine scoring, outperforming few-shot alternatives.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.20799","ref_index":70,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AuthGlass: Benchmarking Voice Liveness Detection and Authentication on Smart Glasses via Comprehensive Acoustic Features","primary_cat":"cs.HC","submitted_at":"2025-09-25T06:27:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The AuthGlass dataset and proposed multi-modal models achieve state-of-the-art results on voice liveness detection and user authentication for smart glasses.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}