End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much context dependent, and dialog history contains useful information that can improve the processing of each conversational turn. In this paper, we investigate the importance of dialog history and how it can be effectively integrated into end-to-end SLU systems. While processing a spoken utterance, our proposed RNN transducer (RNN-T) based SLU model has access to its dialog history in the form of decoded transcripts and SLU labels of previous turns. We encode the dialog history as BERT embeddings, and use them as an additional input to the SLU model along with the speech features for the current utterance. We evaluate our approach on a recently released spoken dialog data set, the HarperValleyBank corpus. We observe significant improvements: 8% for dialog action and 30% for caller intent recognition tasks, in comparison to a competitive context independent end-to-end baseline system.
翻译:处理人与人或人- 计算机互动的端到端口语理解系统( SLU), 通常具有背景独立性, 并独立处理每个对话的转折。 而口语交谈则非常依赖上下文, 对话历史包含有用的信息, 可以改进对每个对话转弯的处理。 在本文中, 我们调查对话历史的重要性, 以及如何有效地将其纳入端到端口语 SLU 系统中。 在处理一个口语时, 我们提议的基于 SLU 的 RNN 传输器( RNNN- T) 模式可以使用其对话历史, 其形式是解码记录和前转折号的 SLU 标签。 我们将对话历史编码为 BERT 嵌入, 并将其作为 SLU 模式的附加投入, 以及当前语句的语音特征 。 我们评估了我们最近发布的语音对话数据集( HarperValley Bank campro) 的处理方法。 我们观察到显著的改进: 对话动作为8%, 调用30% 意向识别任务, 与竞争性独立端到端端基线系统相比, 。