Title resolution pending

Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S · 2023 · arXiv 2303.01037

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Moshi: a speech-text foundation model for real-time dialogue

eess.AS · 2024-09-17 · accept · novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production

cs.DC · 2026-05-09 · unverdicted · novelty 6.0

MegaScale-Omni delivers 1.27x-7.57x higher throughput for dynamic multimodal LLM training by decoupling encoder and LLM parallelism, using unified colocation, and applying adaptive workload balancing.

BlasBench: An Open Benchmark for Irish Speech Recognition

cs.CL · 2026-04-12 · conditional · novelty 6.0

BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.

UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations

eess.AS · 2026-04-16 · unverdicted · novelty 5.0

UniPASE extends the PASE framework with DeWavLM-Omni to convert degraded speech into high-fidelity, low-hallucination audio across sampling rates via phonetic enhancement, acoustic adaptation, and multi-rate vocoding.

Kimi-Audio Technical Report

eess.AS · 2025-04-25 · unverdicted · novelty 5.0

Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.

Dolphin-CN-Dialect: Where Chinese Dialects Matter

cs.CL · 2026-05-09 · unverdicted · novelty 4.0

Dolphin-CN-Dialect is a compact ASR model that boosts Chinese dialect accuracy through balanced sampling of rare dialects and character-level tokenization while staying smaller than recent open-source competitors.

citing papers explorer

Showing 7 of 7 citing papers.

Moshi: a speech-text foundation model for real-time dialogue eess.AS · 2024-09-17 · accept · none · ref 115
Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.
MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production cs.DC · 2026-05-09 · unverdicted · none · ref 62
MegaScale-Omni delivers 1.27x-7.57x higher throughput for dynamic multimodal LLM training by decoupling encoder and LLM parallelism, using unified colocation, and applying adaptive workload balancing.
BlasBench: An Open Benchmark for Irish Speech Recognition cs.CL · 2026-04-12 · conditional · none · ref 32
BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.
PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer cs.CV · 2026-04-07 · unverdicted · none · ref 80
PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.
UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations eess.AS · 2026-04-16 · unverdicted · none · ref 28
UniPASE extends the PASE framework with DeWavLM-Omni to convert degraded speech into high-fidelity, low-hallucination audio across sampling rates via phonetic enhancement, acoustic adaptation, and multi-rate vocoding.
Kimi-Audio Technical Report eess.AS · 2025-04-25 · unverdicted · none · ref 87
Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.
Dolphin-CN-Dialect: Where Chinese Dialects Matter cs.CL · 2026-05-09 · unverdicted · none · ref 19
Dolphin-CN-Dialect is a compact ASR model that boosts Chinese dialect accuracy through balanced sampling of rare dialects and character-level tokenization while staying smaller than recent open-source competitors.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer