Self-supervised Dialogue Learning for Spoken Conversational Question Answering

Chenyu You; Nuo Chen; Yuexian Zou

arxiv: 2106.02182 · v3 · pith:WD27PASGnew · submitted 2021-06-04 · 💻 cs.CL · cs.AI· cs.LG· eess.AS

Self-supervised Dialogue Learning for Spoken Conversational Question Answering

Nuo Chen , Chenyu You , Yuexian Zou This is my paper

classification 💻 cs.CL cs.AIcs.LGeess.AS

keywords spokenlearningquestiondialogueself-supervisedansweringconversationalscqa

0 comments

read the original abstract

In spoken conversational question answering (SCQA), the answer to the corresponding question is generated by retrieving and then analyzing a fixed spoken document, including multi-part conversations. Most SCQA systems have considered only retrieving information from ordered utterances. However, the sequential order of dialogue is important to build a robust spoken conversational question answering system, and the changes of utterances order may severely result in low-quality and incoherent corpora. To this end, we introduce a self-supervised learning approach, including incoherence discrimination, insertion detection, and question prediction, to explicitly capture the coreference resolution and dialogue coherence among spoken documents. Specifically, we design a joint learning framework where the auxiliary self-supervised tasks can enable the pre-trained SCQA systems towards more coherent and meaningful spoken dialogue learning. We also utilize the proposed self-supervised learning tasks to capture intra-sentence coherence. Experimental results demonstrate that our proposed method provides more coherent, meaningful, and appropriate responses, yielding superior performance gains compared to the original pre-trained language models. Our method achieves state-of-the-art results on the Spoken-CoQA dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

NeuroSonic: Conditional Flow Matching for EEG-to-Speech Reconstruction
cs.LG 2026-06 unverdicted novelty 5.0

NeuroSonic introduces a conditional flow-matching framework that learns a deterministic transport from noise to speech conditioned on EEG, reporting up to 26.3% gains in perceptual quality over GAN, diffusion, and mea...