Customer feedback can be an important signal for improving commercial machine translation systems. One solution for fixing specific translation errors is to remove the related erroneous training instances followed by re-training of the machine translation system, which we refer to as instance-specific data filtering. Influence functions (IF) have been shown to be effective in finding such relevant training examples for classification tasks such as image classification, toxic speech detection and entailment task. Given a probing instance, IF find influential training examples by measuring the similarity of the probing instance with a set of training examples in gradient space. In this work, we examine the use of influence functions for Neural Machine Translation (NMT). We propose two effective extensions to a state of the art influence function and demonstrate on the sub-problem of copied training examples that IF can be applied more generally than handcrafted regular expressions.
翻译:客户反馈可以成为改进商业机器翻译系统的重要信号。 确定具体翻译错误的一个办法是消除相关的错误培训案例,随后再对机器翻译系统进行再培训,我们称之为针对具体实例的数据过滤。 影响功能(IF)已证明在寻找图像分类、有毒言语检测和要求任务等分类任务的相关培训范例方面是有效的。 举例来说,IF通过测量标本实例与梯度空间的一组培训实例相似性,找到有影响力的培训实例。 在这项工作中,我们研究了神经机器翻译(NMT)的影响功能的使用情况。 我们建议两次有效扩展艺术影响功能的状态,并在复制培训实例的子问题上展示,IF可以比手工艺的常规表达方式更普遍地应用。