Intent understanding plays an important role in dialog systems, and is typically formulated as a supervised learning problem. However, it is challenging and time-consuming to design the intents for a new domain from scratch, which usually requires a lot of manual effort of domain experts. This paper presents an unsupervised two-stage approach to discover intents and generate meaningful intent labels automatically from a collection of unlabeled utterances in a domain. In the first stage, we aim to generate a set of semantically coherent clusters where the utterances within each cluster convey the same intent. We obtain the utterance representation from various pre-trained sentence embeddings and present a metric of balanced score to determine the optimal number of clusters in K-means clustering for balanced datasets. In the second stage, the objective is to generate an intent label automatically for each cluster. We extract the ACTION-OBJECT pair from each utterance using a dependency parser and take the most frequent pair within each cluster, e.g., book-restaurant, as the generated intent label. We empirically show that the proposed unsupervised approach can generate meaningful intent labels automatically and achieve high precision and recall in utterance clustering and intent discovery.
翻译:内在理解在对话系统中起着重要作用,通常形成为监督学习问题。然而,从零开始设计新领域的意图,通常需要大量领域专家的手工努力,但从头到尾设计新领域的意图既具有挑战性又耗费时间。本文件介绍了一种不受监督的两阶段方法,以发现意图并自动产生有意义的意图标签,从一个领域的一组未加标签的发音中自动产生有意义的意图标签。在第一阶段,我们的目标是产生一组具有内在一致性的组群,每个组群的发音传递相同意图。我们从各种经过培训的句子嵌入中获得了发音代表,并提出了平衡得分的衡量标准,以确定用于均衡数据集的K- means组群集的最佳数目。在第二阶段,目标是为每个组群群生成一个自动的意向标签。我们使用依赖分解器从每个发音中提取行动-OBJECT配对,并以每个组群中最常见的配对(例如书-restaurant)作为生成的意向标签。我们从经验上表明,拟议的未经监督的意向组合和高级意向标签可以自动产生有意义的精确性和彻底发现。