pith. machine review for the scientific record. sign in

Ml-superb: Multilingual speech universal performance benchmark

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.CL 2 cs.SD 1

years

2026 3

clear filters

representative citing papers

A framework for analyzing concept representations in neural models

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.

BlasBench: An Open Benchmark for Irish Speech Recognition

cs.CL · 2026-04-12 · conditional · novelty 6.0

BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.

citing papers explorer

Showing 3 of 3 citing papers after filters.

  • Multi-layer attentive probing improves transfer of audio representations for bioacoustics cs.SD · 2026-05-11 · unverdicted · none · ref 11

    Multi-layer attentive probing outperforms last-layer linear probing for transferring audio representations to bioacoustic tasks, indicating that standard evaluation setups may underestimate model quality.

  • A framework for analyzing concept representations in neural models cs.CL · 2026-05-02 · unverdicted · none · ref 205

    A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.

  • BlasBench: An Open Benchmark for Irish Speech Recognition cs.CL · 2026-04-12 · conditional · none · ref 30

    BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.