In spoken conversational question answering (SCQA), the answer to the corresponding question is generated by retrieving and then analyzing a fixed spoken document, including multi-part conversations. Most SCQA systems have considered only retrieving information from ordered utterances. However, the sequential order of dialogue is important to build a robust spoken conversational question answering system, and the changes of utterances order may severely result in low-quality and incoherent corpora. To this end, we introduce a self-supervised learning approach, including incoherence discrimination, insertion detection, and question prediction, to explicitly capture the coreference resolution and dialogue coherence among spoken documents. Specifically, we design a joint learning framework where the auxiliary self-supervised tasks can enable the pre-trained SCQA systems towards more coherent and meaningful spoken dialogue learning. We also utilize the proposed self-supervised learning tasks to capture intra-sentence coherence. Experimental results demonstrate that our proposed method provides more coherent, meaningful, and appropriate responses, yielding superior performance gains compared to the original pre-trained language models. Our method achieves state-of-the-art results on the Spoken-CoQA dataset.
翻译:在口头交谈回答(SCQA)中,对相应问题的答案是通过检索和分析固定的口头文件(包括多部分对话)产生,而后又分析固定的口头文件(包括多部分对话),大多数SCQA系统只考虑从有命令的语句中检索信息,然而,对话顺序顺序对于建立强有力的口头交谈回答系统十分重要,言论顺序的改变可能严重导致低质量和不一致的体系。为此,我们引入了一种自我监督的学习方法,包括不一致歧视、插入检测和问题预测,以明确捕捉共同参考分辨率和口头文件之间对话的一致性。具体地说,我们设计了一个联合学习框架,辅助性自我监督的任务可以使经过事先训练的SCQA系统能够实现更加一致和有意义的口头对话学习。我们还利用拟议的自我监督学习任务来捕捉内部一致性。实验结果表明,我们提出的方法提供了更加一致、有意义和适当的反应,与最初经过训练的语言模式相比,产生优异的绩效。我们的方法在SQA上实现了州-A上的数据结果。