Multi-layer attentive probing outperforms last-layer linear probing for transferring audio representations to bioacoustic tasks, indicating that standard evaluation setups may underestimate model quality.
Superb: Speech pro- cessing universal performance benchmark
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Alethia is a pretrained audio encoder using continuous embedding prediction and generative flow-matching reconstruction that outperforms existing speech foundation models on voice deepfake tasks with better robustness and zero-shot generalization.
ULTRAS unifies audio and speech representation learning in a single transformer by applying patch masking to log-mel spectrograms and using a joint spectral-temporal prediction loss.
citing papers explorer
-
Multi-layer attentive probing improves transfer of audio representations for bioacoustics
Multi-layer attentive probing outperforms last-layer linear probing for transferring audio representations to bioacoustic tasks, indicating that standard evaluation setups may underestimate model quality.
-
Alethia: A Foundational Encoder for Voice Deepfakes
Alethia is a pretrained audio encoder using continuous embedding prediction and generative flow-matching reconstruction that outperforms existing speech foundation models on voice deepfake tasks with better robustness and zero-shot generalization.
-
ULTRAS -- Unified Learning of Transformer Representations for Audio and Speech Signals
ULTRAS unifies audio and speech representation learning in a single transformer by applying patch masking to log-mel spectrograms and using a joint spectral-temporal prediction loss.