The focus of this work is to investigate unsupervised approaches to overcome quintessential challenges in designing task-oriented dialog schema: assigning intent labels to each dialog turn (intent clustering) and generating a set of intents based on the intent clustering methods (intent induction). We postulate there are two salient factors for automatic induction of intents: (1) clustering algorithm for intent labeling and (2) user utterance embedding space. We compare existing off-the-shelf clustering models and embeddings based on DSTC11 evaluation. Our extensive experiments demonstrate that the combined selection of utterance embedding and clustering method in the intent induction task should be carefully considered. We also present that pretrained MiniLM with Agglomerative clustering shows significant improvement in NMI, ARI, F1, accuracy and example coverage in intent induction tasks. The source codes are available at https://github.com/Jeiyoon/dstc11-track2.
翻译:本文旨在研究无监督方法以克服设计任务导向对话架构中的经典挑战:对每个对话转换分配意图标签(意图聚类)和根据意图聚类方法生成一组意图(意图感知)。我们假设自动感知意图的两个重要因素是:(1)意图标签的聚类算法,和(2)用户话语嵌入空间。我们基于DSTC11评估比较了现有的现成聚类模型和嵌入。我们广泛的实验表明,在意图感知任务中,需要谨慎考虑话语嵌入和聚类方法的组合选择。我们还展示了预训练的MiniLM结合凝聚聚类在意图感知任务中显著提高了NMI、ARI、F1、准确度和样本覆盖度。源代码可在https://github.com/Jeiyoon/dstc11-track2上获得。