Zero/few-shot transfer to unseen services is a critical challenge in task-oriented dialogue research. The Schema-Guided Dialogue (SGD) dataset introduced a paradigm for enabling models to support an unlimited number of services without additional data collection or re-training through the use of schemas. Schemas describe APIs in natural language, which models consume to understand the services they need to support. However, the impact of the choice of language in these schemas on model performance remains unexplored. We address this by releasing SGD-X, a benchmark for measuring the robustness of dialogue systems to linguistic variations in schemas. SGD-X extends the SGD dataset with crowdsourced variants for every schema, where variants are semantically similar yet stylistically diverse. We evaluate two top-performing dialogue state tracking models on SGD-X and observe that neither generalizes well across schema variants, measured by joint goal accuracy and a novel metric for measuring schema sensitivity. Finally, we present a simple model-agnostic data augmentation method to improve schema robustness and zero-shot generalization to unseen services.
翻译:“Schema-Guid Diaction”(SGD)数据集引入了一种模式模式,使模型能够支持数量无限的服务,而无需通过使用 schemas 进行额外的数据收集或再培训。Schemas 描述自然语言的API,模型消耗这些语言来理解他们需要支持的服务。然而,这些模型中语言选择的语言对模型性能的影响仍未得到探讨。我们通过释放SGD-X来解决这个问题,SGD-X是衡量对话系统稳健性以适应战略语言变化的基准。 SGD-X将SGD数据集扩展为每个系统群集源变异式,每个系统变式的变式都具有群,在语义上相似,但形式上也各不相同。我们评估了SGD-X的两个顶级对话状态跟踪模型,并观察到,用联合目标精确度和测算系统灵敏度的新度测量标准衡量的系统性变式之间,都没有很好地概括。最后,我们提出了一种简单的模型数据增强模型性数据增强系统稳健性和零位化的方法。