Spoken Language Understanding (SLU) typically comprises of an automatic speech recognition (ASR) followed by a natural language understanding (NLU) module. The two modules process signals in a blocking sequential fashion, i.e., the NLU often has to wait for the ASR to finish processing on an utterance basis, potentially leading to high latencies that render the spoken interaction less natural. In this paper, we propose recurrent neural network (RNN) based incremental processing towards the SLU task of intent detection. The proposed methodology offers lower latencies than a typical SLU system, without any significant reduction in system accuracy. We introduce and analyze different recurrent neural network architectures for incremental and online processing of the ASR transcripts and compare it to the existing offline systems. A lexical End-of-Sentence (EOS) detector is proposed for segmenting the stream of transcript into sentences for intent classification. Intent detection experiments are conducted on benchmark ATIS, Snips and Facebook's multilingual task oriented dialog datasets modified to emulate a continuous incremental stream of words with no utterance demarcation. We also analyze the prospects of early intent detection, before EOS, with our proposed system.
翻译:语言语言理解(SLU)通常包括自动语音识别(ASR),然后是自然语言理解(NLU)模块。两个模块处理过程以阻塞顺序顺序方式发出信号,即NLU通常必须等待ASR完成发声处理,这可能导致口语互动不那么自然的高度延迟。在本文中,我们提议以SLU意向检测任务为基准的经常性神经网络(RNN)进行递增处理。提议的方法比典型的SLU系统低延迟时间,而系统准确性则没有任何显著降低。我们引入和分析不同经常性神经网络结构,用于对ASR记录进行递增和在线处理,并将其与现有的离线系统进行比较。我们还提议在EOS之前,对将记录流分解成文字以意图分类的词汇进行一个词汇检测检测检测仪。对ATIS、Snips和Facebook的多语言导向式对话数据集进行了深入检测实验,并进行了修改,以模拟连续的递增的单词流,而没有直线标。我们还分析了EOS之前的早期检测前景。