Current researches on spoken language understanding (SLU) heavily are limited to a simple setting: the plain text-based SLU that takes the user utterance as input and generates its corresponding semantic frames (e.g., intent and slots). Unfortunately, such a simple setting may fail to work in complex real-world scenarios when an utterance is semantically ambiguous, which cannot be achieved by the text-based SLU models. In this paper, we first introduce a new and important task, Profile-based Spoken Language Understanding (ProSLU), which requires the model that not only relies on the plain text but also the supporting profile information to predict the correct intents and slots. To this end, we further introduce a large-scale human-annotated Chinese dataset with over 5K utterances and their corresponding supporting profile information (Knowledge Graph (KG), User Profile (UP), Context Awareness (CA)). In addition, we evaluate several state-of-the-art baseline models and explore a multi-level knowledge adapter to effectively incorporate profile information. Experimental results reveal that all existing text-based SLU models fail to work when the utterances are semantically ambiguous and our proposed framework can effectively fuse the supporting information for sentence-level intent detection and token-level slot filling. Finally, we summarize key challenges and provide new points for future directions, which hopes to facilitate the research.
翻译:目前对口语理解(SLU)的研究严重限于一个简单的环境:基于简单文本的简单文本语言理解(ProSLU),它将用户的言论作为投入,并生成相应的语义框架(如意向和空档)。 不幸的是,这种简单背景在复杂的现实世界情景中可能无法发挥作用,因为语义模糊,无法通过基于文本的SLU模式实现。此外,我们首先推出一项新的重要任务,即基于概况的语音语言理解(ProSLU),它要求模型不仅依赖简单文本,而且还需要辅助性剖析信息,以预测正确的意向和空档。为此,我们进一步引入了规模庞大的带有5K语表达及其相应的支持性信息(知识图表(KG)、用户概况(UP)、背景认识(CAA))的中国附加说明性数据集。此外,我们评估了几个基于现状的基线模型,并探索了将多层次的知识调整器有效地纳入剖面信息。实验结果显示,所有基于文本的中国附加说明的中国数据数据集模型都无法在我们提出的模范级关键阶段工作上提供。