Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast Attention Network (FAN) for joint intent detection and slot filling tasks, guaranteeing both accuracy and latency. Specifically, we introduce a clean and parameter-refined attention module to enhance the information exchange between intent and slot, improving semantic accuracy by more than 2%. FAN can be implemented on different encoders and delivers more accurate models at every speed level. Our experiments on the Jetson Nano platform show that FAN inferences fifteen utterances per second with a small accuracy drop, showing its effectiveness and efficiency on edge devices.
翻译:故意探测和填补空档是自然语言理解的两大主要任务,在面向任务的对话系统中发挥着关键作用。 共同学习这两个任务可以提高推断准确性,并在近期工作中很受欢迎。 但是,大多数联合模型忽略了推断延迟度,无法满足在边缘部署对话系统的需求。 在本文中,我们建议建立一个快速关注网络,用于联合意图探测和填补空档任务,同时保证准确性和延迟性。 具体地说,我们引入一个清洁和参数精炼的关注模块,以加强意图和空档之间的信息交流,提高2%以上的语义准确性。 FAN可以在不同的编码器上实施,在每个速度级别提供更准确的模型。 我们在Jetson Nano平台上进行的实验显示,FAN推导出每秒15个话,以小精度下降,显示其在边缘装置上的效力和效率。