pith. machine review for the scientific record. sign in

Mini-omni: Language models can hear, talk while thinking in streaming

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

years

2026 7 2025 2

clear filters

representative citing papers

Kimi-Audio Technical Report

eess.AS · 2025-04-25 · unverdicted · novelty 5.0

Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.

Qwen2.5-Omni Technical Report

cs.CL · 2025-03-26 · conditional · novelty 5.0

Qwen2.5-Omni presents a multimodal model with block-wise encoders, TMRoPE position embeddings, and a Thinker-Talker architecture that enables simultaneous text and streaming speech generation while matching text performance on reasoning benchmarks.

PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory

cs.AI · 2026-04-09 · unverdicted · novelty 4.0

PASK introduces the DD-MM-PAS paradigm for streaming proactive agents with intent-aware detection, hybrid memory modeling, and a new real-world benchmark where the IntentFlow model matches top LLMs on latency while finding deeper intents.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Qwen2.5-Omni Technical Report cs.CL · 2025-03-26 · conditional · none · ref 38

    Qwen2.5-Omni presents a multimodal model with block-wise encoders, TMRoPE position embeddings, and a Thinker-Talker architecture that enables simultaneous text and streaming speech generation while matching text performance on reasoning benchmarks.