Interest in dialog systems has grown substantially in the past decade. By extension, so too has interest in developing and improving intent classification and slot-filling models, which are two components that are commonly used in task-oriented dialog systems. Moreover, good evaluation benchmarks are important in helping to compare and analyze systems that incorporate such models. Unfortunately, much of the literature in the field is limited to analysis of relatively few benchmark datasets. In an effort to promote more robust analyses of task-oriented dialog systems, we have conducted a survey of publicly available datasets for the tasks of intent classification and slot-filling. We catalog the important characteristics of each dataset, and offer discussion on the applicability, strengths, and weaknesses of each. Our goal is that this survey aids in increasing the accessibility of these datasets, which we hope will enable their use in future evaluations of intent classification and slot-filling models for task-oriented dialog systems.
翻译:过去十年来,对对话系统的兴趣大幅增长。延伸而言,对开发和改进意图分类和填补空档模式的兴趣也大增,这是面向任务的对话系统通常使用的两个组成部分。此外,良好的评价基准对于帮助比较和分析纳入这种模型的系统很重要。不幸的是,许多实地文献仅限于分析相对较少的基准数据集。为了促进对任务导向的对话系统进行更强有力的分析,我们为意向分类和填补空档的任务对公开可得到的数据集进行了调查。我们把每个数据集的重要特征编成目录,并就每个数据集的可适用性、长处和弱点进行讨论。我们的目标是,这项调查有助于增加这些数据集的可获取性,我们希望这些数据集将在今后对意图分类和任务导向对话系统的空档模型进行评估时加以利用。