pith. machine review for the scientific record. sign in

Title resolution pending

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

Moshi: a speech-text foundation model for real-time dialogue

eess.AS · 2024-09-17 · accept · novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

BlasBench: An Open Benchmark for Irish Speech Recognition

cs.CL · 2026-04-12 · conditional · novelty 6.0

BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.

Kimi-Audio Technical Report

eess.AS · 2025-04-25 · unverdicted · novelty 5.0

Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.

Dolphin-CN-Dialect: Where Chinese Dialects Matter

cs.CL · 2026-05-09 · unverdicted · novelty 4.0

Dolphin-CN-Dialect is a compact ASR model that boosts Chinese dialect accuracy through balanced sampling of rare dialects and character-level tokenization while staying smaller than recent open-source competitors.

citing papers explorer

Showing 7 of 7 citing papers.

  • Moshi: a speech-text foundation model for real-time dialogue eess.AS · 2024-09-17 · accept · none · ref 115

    Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

  • MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production cs.DC · 2026-05-09 · unverdicted · none · ref 62

    MegaScale-Omni delivers 1.27x-7.57x higher throughput for dynamic multimodal LLM training by decoupling encoder and LLM parallelism, using unified colocation, and applying adaptive workload balancing.

  • BlasBench: An Open Benchmark for Irish Speech Recognition cs.CL · 2026-04-12 · conditional · none · ref 32

    BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.

  • PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer cs.CV · 2026-04-07 · unverdicted · none · ref 80

    PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.

  • UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations eess.AS · 2026-04-16 · unverdicted · none · ref 28

    UniPASE extends the PASE framework with DeWavLM-Omni to convert degraded speech into high-fidelity, low-hallucination audio across sampling rates via phonetic enhancement, acoustic adaptation, and multi-rate vocoding.

  • Kimi-Audio Technical Report eess.AS · 2025-04-25 · unverdicted · none · ref 87

    Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.

  • Dolphin-CN-Dialect: Where Chinese Dialects Matter cs.CL · 2026-05-09 · unverdicted · none · ref 19

    Dolphin-CN-Dialect is a compact ASR model that boosts Chinese dialect accuracy through balanced sampling of rare dialects and character-level tokenization while staying smaller than recent open-source competitors.