Being able to reply with a related, fluent, and informative response is an indispensable requirement for building high-quality conversational agents. In order to generate better responses, some approaches have been proposed, such as feeding extra information by collecting large-scale datasets with human annotations, designing neural conversational models (NCMs) with complex architecture and loss functions, or filtering out untrustworthy samples based on a dialogue attribute, e.g., Relatedness or Genericness. In this paper, we follow the third research branch and present a data filtering method for open-domain dialogues, which identifies untrustworthy samples from training data with a quality measure that linearly combines seven dialogue attributes. The attribute weights are obtained via Bayesian Optimization (BayesOpt) that aims to optimize an objective function for dialogue generation iteratively on the validation set. Then we score training samples with the quality measure, sort them in descending order, and filter out those at the bottom. Furthermore, to accelerate the "filter-train-evaluate" iterations involved in BayesOpt on large-scale datasets, we propose a training framework that integrates maximum likelihood estimation (MLE) and negative training method (NEG). The training method updates parameters of a trained NCMs on two small sets with newly maintained and removed samples, respectively. Specifically, MLE is applied to maximize the log-likelihood of newly maintained samples, while NEG is used to minimize the log-likelihood of newly removed ones. Experimental results on two datasets show that our method can effectively identify untrustworthy samples, and NCMs trained on the filtered datasets achieve better performance.
翻译:能够以相关、流利和内容丰富的响应做出回应是建立高质量对话剂不可或缺的条件。 为了产生更好的回应,我们提出了一些方法,例如通过收集带有人文注释的大规模数据集来输入额外信息,设计具有复杂结构和损失功能的神经谈话模型(NCMS),或者根据对话属性(例如关联性或通用性)过滤不可信的样本。在本文中,我们遵循第三个研究分支,并为开放式对话提供一个数据过滤方法,该方法从培训数据中找出不可靠的样本,其质量计量将线性地结合了七个数据对话框属性。属性加权是通过Bayesian Optimination(BayesOpimization)获得的,目的是优化对话生成的客观功能,同时在验证集集中互动进行互动。然后,我们用质量计量对样本进行评分,按降序排序,从底部筛选。此外,为了加快“过滤-下层评估”在大型数据设置中所使用的“不可靠的”样本,我们建议通过精细的模板对测试结果进行精细的更新,同时在新版的 NEML 方法上进行最精确的测试。