This paper investigates unsupervised approaches to overcome quintessential challenges in designing task-oriented dialog schema: assigning intent labels to each dialog turn (intent clustering) and generating a set of intents based on the intent clustering methods (intent induction). We postulate there are two salient factors for automatic induction of intents: (1) clustering algorithm for intent labeling and (2) user utterance embedding space. We compare existing off-the-shelf clustering models and embeddings based on DSTC11 evaluation. Our extensive experiments demonstrate that we sholud add two huge caveat that selection of utterance embedding and clustering method in intent induction task should be very careful. We also present that pretrained MiniLM with Agglomerative clustering shows significant improvement in NMI, ARI, F1, accuracy and example coverage in intent induction tasks. The source code for reimplementation will be available at Github.
翻译:本文调查了在设计面向任务的对话模式方面克服典型挑战的未经监督的方法:为每个对话转弯(有意集群)分配意向标签,并根据意图集群方法(意图诱导)产生一套意向。我们假设自动引入意向有两个显著因素:(1) 意图标签的组合算法和(2) 用户超语表达嵌入空间。我们比较了现有的现成组合模型和基于DSTC11评估的嵌入。我们的广泛实验表明,我们舍鲁德增加了两个巨大的警告:在意图上岗任务中选择言语嵌入和组合方法应非常谨慎。我们还表明,预先培训的具有聚合群的微型LM(M)显示,NMI、ARI、F1、准确性和意图引入任务中的例子覆盖面有显著改进。将在Github提供实施源代码。