We present our work on Track 2 in the Dialog System Technology Challenges 11 (DSTC11). DSTC11-Track2 aims to provide a benchmark for zero-shot, cross-domain, intent-set induction. In the absence of in-domain training dataset, robust utterance representation that can be used across domains is necessary to induce users' intentions. To achieve this, we leveraged a multi-domain dialogue dataset to fine-tune the language model and proposed extracting Verb-Object pairs to remove the artifacts of unnecessary information. Furthermore, we devised the method that generates each cluster's name for the explainability of clustered results. Our approach achieved 3rd place in the precision score and showed superior accuracy and normalized mutual information (NMI) score than the baseline model on various domain datasets.
翻译:我们在第11届对话系统技术挑战赛(DSTC11)中的赛道2上展示了我们的工作。DSTC11-Track2旨在为零样本、跨领域、意图集合归纳提供基准。在没有领域内的训练数据集的情况下,需要鲁棒的话语表示以在各个领域中诱导用户的意图。为了实现这一点,我们利用了一个多领域对话数据集进行了语言模型微调,并提出了提取动词-宾语对以消除不必要信息的做法。此外,我们设计了一种方法来为聚类结果的可解释性生成每个聚类的名称。我们的方法在精确度得分方面获得了第三名,并在各个领域数据集上表现出优越的准确性和归一化互信息(NMI)得分。