We introduce GODEL (Grounded Open Dialogue Language Model), a large pre-trained language model for dialog. In contrast with earlier models such as DialoGPT, GODEL leverages a new phase of grounded pre-training designed to better support adapting GODEL to a wide range of downstream dialog tasks that require information external to the current conversation (e.g., a database or document) to produce good responses. Experiments against an array of benchmarks that encompass task-oriented dialog, conversational QA, and grounded open-domain dialog show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups, in terms of both human and automatic evaluation. A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses (extrinsic evaluation) in addition to their communicative features (intrinsic evaluation). We show that extrinsic evaluation offers improved inter-annotator agreement and correlation with automated metrics. Code and data processing scripts are publicly available.
翻译:我们引入了GEDEL(环形开放对话语言模型),这是一个大型的预先培训的语言对话模式。与像DialoGPT这样的早期模型相比,DEDEL利用了一个新的基础培训前阶段,目的是更好地支持DODEL适应一系列广泛的下游对话任务,这些任务需要当前对话之外的信息(例如数据库或文件)才能产生良好的响应。对一系列基准(包括任务导向对话、对话QA和有根据的开放式对话)的实验表明,DOEL在人和自动评价方面,在几发微调设置中,超越了最先进的预培训对话模式。我们评估方法的一个新特征是引入了一种效用概念,评估应对措施的效用(极端评价)除了其交流特征(动态评价)外,还有评估应对措施的效用(极端评价)。我们显示,外部评价提供了更好的内部通知协议,以及与自动计量的关联性。代码和数据处理脚本是公开的。