Most recent semantic frame parsing systems for spoken language understanding (SLU) are designed based on recurrent neural networks. These systems display decent performance on benchmark SLU datasets such as ATIS or SNIPS, which contain short utterances with relatively simple patterns. However, the current semantic frame parsing models lack a mechanism to handle out-of-distribution (\emph{OOD}) patterns and out-of-vocabulary (\emph{OOV}) tokens. In this paper, we introduce a robust semantic frame parsing pipeline that can handle both \emph{OOD} patterns and \emph{OOV} tokens in conjunction with a new complex Twitter dataset that contains long tweets with more \emph{OOD} patterns and \emph{OOV} tokens. The new pipeline demonstrates much better results in comparison to state-of-the-art baseline SLU models on both the SNIPS dataset and the new Twitter dataset (Our new Twitter dataset can be downloaded from https://1drv.ms/u/s!AroHb-W6_OAlavK4begsDsMALfE?e=c8f2XX ). Finally, we also build an E2E application to demo the feasibility of our algorithm and show why it is useful in real application.
翻译:最近关于口语理解的语义框架解析系统(SLU)是建立在经常性神经网络的基础上设计的。这些系统在基准 SLU 数据集上显示体面的性能,如ATIS或SNIPS, 其中包括短话和相对简单的模式。然而,目前的语义框架解析模型缺乏处理外分配(emph{OOOOD})模式和外词汇(emph{OOOOV})标志的机制。在本文中,我们引入了一个强大的语义框架解析管道,既可以处理 emph{OOOOD} 模式,也可以处理\emph{OOOOPS 模式和\emph{OOOV} 标志等基准 SLUD 模式。在新的复杂的推特数据集中包含长的推文, 包括更多的 emph{OOOOOOD} 模式和 emph{OOOOOOOV} 符号。 新的管道在SNIPS 和新的TVD数据集上, 我们的新推特数据集可以下载为什么从 https://Odrv_LAm/sub=LAs tralev_Tal_Lism_Lis