A full-duplex speech dialogue scheme based on large language model

· 2024 · arXiv 2405.19487

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action

eess.AS · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

DuplexSLA introduces a three-channel full-duplex architecture that synchronizes continuous user audio, discrete assistant audio, and rate-limited textual actions inside a single backbone for native turn-taking and in-conversation tool use.

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

VITA-QinYu is the first expressive end-to-end spoken language model supporting role-playing and singing alongside conversation, trained on 15.8K hours of data and outperforming prior models on expressiveness and conversational benchmarks.

Moshi: a speech-text foundation model for real-time dialogue

eess.AS · 2024-09-17 · accept · novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

LMPAN: A Lightweight Multi-Path Alignment Network for Joint Full-Duplex Acoustic Echo Cancellation and Noise Suppression

eess.AS · 2026-07-02 · unverdicted · novelty 5.0

LMPAN is a 480K-parameter network using multi-path alignment, attention integration, and dynamic post-filtering that matches larger models on joint AEC and NS while supporting real-time inference.

DuplexOmni: Real-Time Listening, Seeing, Thinking, and Speaking for Full-Duplex Interaction

cs.HC · 2026-06-08 · unverdicted · novelty 5.0

DuplexOmni achieves real-time full-duplex multimodal interaction by separating an interaction layer from a pluggable thinking layer, supported by a Writer-Director pipeline for continuous-interaction training data.

citing papers explorer

Showing 5 of 5 citing papers.

DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action eess.AS · 2026-05-20 · unverdicted · none · ref 1 · 2 links
DuplexSLA introduces a three-channel full-duplex architecture that synchronizes continuous user audio, discrete assistant audio, and rate-limited textual actions inside a single backbone for native turn-taking and in-conversation tool use.
VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing cs.CL · 2026-05-07 · unverdicted · none · ref 52
VITA-QinYu is the first expressive end-to-end spoken language model supporting role-playing and singing alongside conversation, trained on 15.8K hours of data and outperforming prior models on expressiveness and conversational benchmarks.
Moshi: a speech-text foundation model for real-time dialogue eess.AS · 2024-09-17 · accept · none · ref 104
Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.
LMPAN: A Lightweight Multi-Path Alignment Network for Joint Full-Duplex Acoustic Echo Cancellation and Noise Suppression eess.AS · 2026-07-02 · unverdicted · none · ref 6
LMPAN is a 480K-parameter network using multi-path alignment, attention integration, and dynamic post-filtering that matches larger models on joint AEC and NS while supporting real-time inference.
DuplexOmni: Real-Time Listening, Seeing, Thinking, and Speaking for Full-Duplex Interaction cs.HC · 2026-06-08 · unverdicted · none · ref 21
DuplexOmni achieves real-time full-duplex multimodal interaction by separating an interaction layer from a pluggable thinking layer, supported by a Writer-Director pipeline for continuous-interaction training data.

A full-duplex speech dialogue scheme based on large language model

fields

years

verdicts

representative citing papers

citing papers explorer