We examine the inducement of rare but severe errors in English-Chinese and Chinese-English in-domain neural machine translation by minimal deletion of the source text with character-based models. By deleting a single character, we find that we can induce severe errors in the translation. We categorize these errors and compare the results of deleting single characters and single words. We also examine the effect of training data size on the number and types of pathological cases induced by these minimal perturbations, finding significant variation.
翻译:我们通过尽可能少地删除带有字符模型的源文本,来研究中中、中、中、中、英神经机翻译中稀有但严重错误的诱因。通过删除一个字符,我们发现我们可以诱发翻译中的严重错误。我们对这些错误进行分类,比较删除单字符和单词的结果。我们还研究培训数据大小对这些干扰最小的病理案例的数量和类型的影响,发现差异很大。