Goal oriented dialogue systems were originally designed as a natural language interface to a fixed data-set of entities that users might inquire about, further described by domain, slots, and values. As we move towards adaptable dialogue systems where knowledge about domains, slots, and values may change, there is an increasing need to automatically extract these terms from raw dialogues or related non-dialogue data on a large scale. In this paper, we take an important step in this direction by exploring different features that can enable systems to discover realizations of domains, slots, and values in dialogues in a purely data-driven fashion. The features that we examine stem from word embeddings, language modelling features, as well as topological features of the word embedding space. To examine the utility of each feature set, we train a seed model based on the widely used MultiWOZ data-set. Then, we apply this model to a different corpus, the Schema-Guided Dialogue data-set. Our method outperforms the previously proposed approach that relies solely on word embeddings. We also demonstrate that each of the features is responsible for discovering different kinds of content. We believe our results warrant further research towards ontology induction, and continued harnessing of topological data analysis for dialogue and natural language processing research.
翻译:以目标为导向的对话系统最初设计为一种自然语言界面,与用户可以查询的固定数据集实体的自然界面,按域、空档和价值进行进一步描述。随着我们转向适应性的对话系统,使关于域、空档和价值的知识可能发生变化,我们越来越需要从原始对话或相关的非对话数据大规模地自动提取这些术语。在本文件中,我们在这方面迈出了重要的一步,探索了不同的特征,使系统能够以纯数据驱动的方式发现对话中域、空档和价值的实现。我们所审查的特征来自文字嵌入、语言建模特征以及嵌入空间的字型特征。为了检查每个功能集的效用,我们根据广泛使用的多功能系统数据集,培训了一个种子模型。然后,我们把这个模型应用到不同的系统,即Schema-Guided对话数据集。我们的方法超越了先前提出的完全依靠文字嵌入式的功能的方法。我们还表明,每个功能都对发现不同内容的词嵌入、语言建模特征以及词嵌入空间的地形特征负责。我们认为,为了检查每个功能的特性,要对各种特性的用途进行高级分析,我们的数据进行深入的学习和进行。我们认为,因此需要继续研究。