pith. machine review for the scientific record. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

240 papers in eess.AS · page 1

  1. cs.SD 2026-05-14 reviewed
    SpeakerLLM turns speaker verification into natural-language reasoning

    SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning

    Ha-Jin Yu +4

  2. eess.AS 2026-05-13 reviewed
    Benchmark standardizes early Parkinson's speech detection

    A Benchmark for Early-stage Parkinson's Disease Detection from Speech

    Bastiaan R. Bloem +5

  3. eess.AS 2026-05-13 reviewed
    Framework filters FSD50K to single-source audio clips

    FSD50K-Solo: Automated Curation of Single-Source Sound Events

    Bryce Irvin +6

  4. eess.AS 2026-05-12 reviewed
    SMC dataset exposes tempo bias in state-of-the-art beat tracking models

    The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking

    Jaehoon Ahn +2

  5. cs.SD 2026-05-12 reviewed
    STRUM turns raw audio into playable rhythm charts at 0.84 F1 for drums

    STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts

    Joshua Opria

  6. eess.AS 2026-05-12 reviewed
    Modern ASR matches humans on enhanced speech but misleads on quality

    Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement

    Danilo de Oliveira +2

  7. eess.AS 2026-05-12 reviewed
    FM-Speech outperforms rivals on 14 fine-grained speech dimensions

    Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model

    Chuan Xie +11

  8. eess.AS 2026-05-12 reviewed
    Chunkwise Aligner matches Transducer accuracy at lower cost

    Chunkwise Aligners for Streaming Speech Recognition

    Masato Mimura +2

  9. eess.SP 2026-05-11 reviewed
    Lanczos Krylov method matches exact EVD for adaptive diagonal loading

    Adaptive Diagonal Loading using Krylov Subspaces for Robust Beamforming

    Andrew C. Singer +3

  10. cs.AI 2026-05-11 reviewed
    AVLLMs store audio-visual data in specialized sink tokens

    Probing Cross-modal Information Hubs in Audio-Visual LLMs

    Chaeyoung Jung +3

  11. cs.AI 2026-05-11 reviewed
    Cross-modal sink tokens store audio-visual info in AVLLMs

    Probing Cross-modal Information Hubs in Audio-Visual LLMs

    Chaeyoung Jung +3

  12. eess.AS 2026-05-11 reviewed
    Flow matching reconstructs sound fields from few microphones

    SF-Flow: Sound field magnitude estimation via flow matching guided by sparse measurements

    Ege Erdem +4

  13. cs.SD 2026-05-11 reviewed
    Acoustic priors sharpen timbre edits in polyphonic music

    Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration

    Boyu Cao +4

  14. cs.CL 2026-05-11 reviewed
    Direct user routing improves spoken QA but risks incoherent interruptions

    How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

    Hui Lu +6

  15. eess.AS 2026-05-11 reviewed
    Disentangling power doubles audio generation convergence speed

    PoDAR: Power-Disentangled Audio Representation for Generative Modeling

    Alejandro Luebs +7

  16. eess.AS 2026-05-10 reviewed
    Late reverberation tail reveals same-room source location from one mic

    Single-Microphone Audio Point Source Discriminative Localization From Reverberation Late Tail Estimation

    Matthew Maciejewski

  17. eess.AS 2026-05-10 reviewed
    Challenge shows audio deepfake detectors still fail after media changes

    RADAR Challenge 2026: Robust Audio Deepfake Recognition under Media Transformations

    Hieu-Thi Luong +4

  18. eess.AS 2026-05-10 reviewed
    Context model outperforms speech evaluators on appropriateness

    Evaluating the Expressive Appropriateness of Speech in Rich Contexts

    Cheng Gong +28

  19. eess.AS 2026-05-10 reviewed
    Kinetic schedule and moment fix lift zero-shot TTS

    Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

    Dong Yang +4

  20. cs.CL 2026-05-09 reviewed
    Temperature sampling lifts Chinese dialect ASR accuracy

    Dolphin-CN-Dialect: Where Chinese Dialects Matter

    Guanbo Wang +8

  21. eess.AS 2026-05-09 reviewed
    Distillation cuts hallucinations in LM-based speech enhancement

    Reducing Linguistic Hallucination in LM-Based Speech Enhancement via Noise-Invariant Acoustic-Semantic Distillation

    Hang Su +8

  22. eess.AS 2026-05-08 reviewed
    Keyed rotations watermark speech in codec latent spaces

    Latent Secret Spin: Keyed Orthogonal Rotations for Blind Speech Watermarking in Anisotropic Latent Spaces

    Antonio Faonio +4

  23. cs.LG 2026-05-08 reviewed
    Mapping imagined MEG to listened signals decodes unspoken words

    Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

    Maryam Maghsoudi +1

  24. eess.AS 2026-05-08 reviewed
    Distance model switches from reverberation to delay with calibration

    Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation

    Archontis Politis +2

  25. eess.AS 2026-05-08 reviewed
    Rank metric reveals voice anonymisation leaks EER overlooks

    Evaluating voice anonymisation using similarity rank disclosure

    Dorothea Kolossa +9

  26. cs.CR 2026-05-08 reviewed
    Phase-coded audio watermark verifies at 98% after attacks

    Asymmetric Phase Coding Audio Watermarking

    Amir Ghasemian +3

  27. cs.CL 2026-05-07 reviewed
    MIST benchmark shows LLMs lag on voice IoT tasks

    MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes

    Alexandros Papangelis +5

  28. eess.AS 2026-05-07 reviewed
    Protocol approves audio compression only for worst query families

    Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models

    Amir Ivry

  29. eess.IV 2026-05-07 reviewed
    Neural codec with FFT encoder outperforms tokenizers on sensors

    LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation

    Dan Jacobellis +1

  30. cs.LG 2026-05-07 reviewed
    Weight decay induces Villani coercivity in Transformer losses

    Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization

    Abhijit Das +1

  31. eess.AS 2026-05-07 reviewed
    Compact latent unifies speech understanding and generation

    WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling

    Guanrou Yang +14

  32. eess.AS 2026-05-07 reviewed
    Decomposing interpolants boosts speech enhancement quality

    Predictive-Generative Drift Decomposition for Speech Enhancement and Separation

    Christoph Boeddeker +5

  33. eess.AS 2026-05-07 reviewed
    NDF+ adds control over diffuse sound in virtual microphone outputs

    NDF+: Joint Neural Directional Filtering and Diffuse Sound Extraction

    Emanu\"el A. P. Habets +3

  34. cs.CL 2026-05-07 reviewed
    Prosody embeddings at input cut speech LLM modality gap

    Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM

    Daxin Tan +4

  35. cs.CL 2026-05-07 reviewed
    Input prosody alignment shrinks speech LLM modality gap

    Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM

    Daxin Tan +4

  36. cs.SD 2026-05-07 reviewed
    0.4B model clones voices across 30 languages without transcripts

    X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

    Berrak Sisman +12

  37. cs.SD 2026-05-07 reviewed
    0.4B model clones any voice across 30 languages zero-shot

    X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

    Berrak Sisman +12

  38. eess.AS 2026-05-07 reviewed
    Learned Riemannian costs raise audio distance correlation with human ratings

    Optimal Transport Audio Distance with Learned Riemannian Ground Metrics

    Wonwoo Jeong

  39. eess.AS 2026-05-06 reviewed
    Neural net creates virtual mics to nearly match full-array performance

    Spatial-Magnifier: Spatial upsampling for multichannel speech enhancement

    Ashutosh Pandey +6

  40. cs.SD 2026-05-06 reviewed
    Bangla ASR hits 0.2441 WER after Whisper fine-tuning

    Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization

    Ahmed Faizul Haque Dhrubo +6

  41. eess.AS 2026-05-06 reviewed
    Instruction-tuned model matches human audio ratings without retraining

    JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions

    Bach Viet Do +4

  42. cs.SD 2026-05-05 reviewed
    Web tool maps ship underwater noise worldwide in near real time

    ShipEcho -- An Interactive Tool for Global Mapping of Underwater Radiated Noise from Vessels

    {\DJ}ula Na{\dj} +3

  43. cs.SD 2026-05-05 reviewed
    0.1B omni model reaches 0.09 CER in speech-text consistency

    MiniMind-O Technical Report: An Open Small-Scale Speech-Native Omni Model

    Jingyao Gong

  44. eess.AS 2026-05-05 reviewed
    Classical codecs resist noise better than neural ones

    Assessing the Impact of Noise and Speech Enhancement on the Intelligibility of Speech Codecs

    Anjana Rajasekhar +4

  45. eess.AS 2026-05-05 reviewed
    Diffusion model beats top echo canceller with less compute

    DiffVQE: Hybrid Diffusion Voice Quality Enhancement Under Acoustic Echo and Noise

    Ernst Seidel +4

  46. eess.AS 2026-05-05 reviewed
    Entropy minimization decomposes for autoregressive test-time adaptation

    Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models

    Chee-En Yu +3

  47. cs.SD 2026-05-04 reviewed
    Phoneme checks detect emotional deepfakes

    Phoneme-Level Deepfake Detection Across Emotional Conditions Using Self-Supervised Embeddings

    Anderson R. Avila +2

  48. eess.AS 2026-05-04 reviewed
    Simple pitch and noise checks catch 85% of bad voice clones

    Low-Cost Detection of Degraded Voice Clones via Source-Output Acoustic Consistency

    Jana Shokr +3

  49. eess.AS 2026-05-04 reviewed
    Partitioned speech vectors allow searches that ignore speaker or focus on words

    Multi-Axis Speech Similarity via Factor-Partitioned Embeddings

    Jens Edlund +1

  50. eess.AS 2026-05-04 reviewed
    Speech embeddings split by attribute for selective similarity searches

    Multi-Axis Speech Similarity via Factor-Partitioned Embeddings

    Jens Edlund +1