Superb: Speech pro- cessing universal performance benchmark

Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y Lin, Andy T Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, et al · 2021 · arXiv 2105.01051

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Multi-layer attentive probing improves transfer of audio representations for bioacoustics

cs.SD · 2026-05-11 · unverdicted · novelty 7.0

Multi-layer attentive probing outperforms last-layer linear probing for transferring audio representations to bioacoustic tasks, indicating that standard evaluation setups may underestimate model quality.

Alethia: A Foundational Encoder for Voice Deepfakes

cs.SD · 2026-04-30 · unverdicted · novelty 6.0

Alethia is a pretrained audio encoder using continuous embedding prediction and generative flow-matching reconstruction that outperforms existing speech foundation models on voice deepfake tasks with better robustness and zero-shot generalization.

ULTRAS -- Unified Learning of Transformer Representations for Audio and Speech Signals

eess.AS · 2026-04-08 · unverdicted · novelty 5.0

ULTRAS unifies audio and speech representation learning in a single transformer by applying patch masking to log-mel spectrograms and using a joint spectral-temporal prediction loss.

citing papers explorer

Showing 3 of 3 citing papers.

Multi-layer attentive probing improves transfer of audio representations for bioacoustics cs.SD · 2026-05-11 · unverdicted · none · ref 9
Multi-layer attentive probing outperforms last-layer linear probing for transferring audio representations to bioacoustic tasks, indicating that standard evaluation setups may underestimate model quality.
Alethia: A Foundational Encoder for Voice Deepfakes cs.SD · 2026-04-30 · unverdicted · none · ref 45
Alethia is a pretrained audio encoder using continuous embedding prediction and generative flow-matching reconstruction that outperforms existing speech foundation models on voice deepfake tasks with better robustness and zero-shot generalization.
ULTRAS -- Unified Learning of Transformer Representations for Audio and Speech Signals eess.AS · 2026-04-08 · unverdicted · none · ref 17
ULTRAS unifies audio and speech representation learning in a single transformer by applying patch masking to log-mel spectrograms and using a joint spectral-temporal prediction loss.

Superb: Speech pro- cessing universal performance benchmark

fields

years

verdicts

representative citing papers

citing papers explorer