Carefully-designed schemas describing how to collect and annotate dialog corpora are a prerequisite towards building task-oriented dialog systems. In practical applications, manually designing schemas can be error-prone, laborious, iterative, and slow, especially when the schema is complicated. To alleviate this expensive and time consuming process, we propose an unsupervised approach for slot schema induction from unlabeled dialog corpora. Leveraging in-domain language models and unsupervised parsing structures, our data-driven approach extracts candidate slots without constraints, followed by coarse-to-fine clustering to induce slot types. We compare our method against several strong supervised baselines, and show significant performance improvement in slot schema induction on MultiWoz and SGD datasets. We also demonstrate the effectiveness of induced schemas on downstream applications including dialog state tracking and response generation.
翻译:精心设计的描述如何收集和批注对话框的系统图案是建立面向任务的对话系统的先决条件。 在实际应用中,手工设计方案可能是容易出错、劳累、迭代和缓慢的,特别是在方案复杂的情况下。为了缓解这种昂贵和耗时的过程,我们建议对未标的对话框中的空档系统图征采用一种不受监督的方法。我们的数据驱动方法将主页语言模型和无人监督的分解结构加以利用,可以不受限制地抽出候选人的空档,然后以粗劣的对底盘组合来诱发空档类型。我们将我们的方法与几个强有力的受监督基线进行比较,并显示在多沃兹和SGD数据集的空档图征集上取得了显著的性能改进。我们还展示了下游应用中诱导的空档图案的有效性,包括对话状态跟踪和生成。