pith. machine review for the scientific record. sign in

Common voice: A massively-multilingual speech corpus

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

clear filters

representative citing papers

High Fidelity Neural Audio Compression

eess.AS · 2022-10-24 · accept · novelty 7.0

EnCodec is an end-to-end trained streaming neural audio codec that uses a single multiscale spectrogram discriminator and a gradient-normalizing loss balancer to achieve higher fidelity than prior methods at the same bitrates for 24 kHz mono and 48 kHz stereo audio.

BlasBench: An Open Benchmark for Irish Speech Recognition

cs.CL · 2026-04-12 · conditional · novelty 6.0

BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.

Kimi-Audio Technical Report

eess.AS · 2025-04-25 · unverdicted · novelty 5.0

Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.

citing papers explorer

Showing 4 of 4 citing papers after filters.

  • High Fidelity Neural Audio Compression eess.AS · 2022-10-24 · accept · none · ref 2

    EnCodec is an end-to-end trained streaming neural audio codec that uses a single multiscale spectrogram discriminator and a gradient-normalizing loss balancer to achieve higher fidelity than prior methods at the same bitrates for 24 kHz mono and 48 kHz stereo audio.

  • HARNESS: Lightweight Distilled Arabic Speech Foundation Models eess.AS · 2026-03-31 · accept · none · ref 2

    HARNESS introduces Arabic-centric speech foundation models that achieve high efficiency and performance through iterative self-distillation and PCA-based signal compression.

  • Kimi-Audio Technical Report eess.AS · 2025-04-25 · unverdicted · none · ref 1

    Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.

  • In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions eess.AS · 2026-04-14 · unverdicted · none · ref 31

    Lightweight training strategies allow speech-aware LLMs to output accurate word timestamps alongside ASR transcripts while also improving recognition quality across datasets.