Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source-hypothesis sentence pairs. Besides, we build an FG-TED model to predict the \textbf{addition} and \textbf{omission} errors -- two typical translation accuracy errors. First, we use a word-level classification paradigm to form our model and use the shortcut learning reduction to relieve the influence of monolingual features. Besides, we construct synthetic datasets for model training, and relieve the disagreement of data labeling in authoritative datasets, making the experimental benchmark concordant. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results on the restored dataset. Our model also delivers more reliable predictions on low-resource and transfer scenarios than existing baselines. The related datasets and the source code will be released in the future.
翻译:有关翻译错误的精细信息对翻译评估界是有助益的。 现有方法无法同步考虑错误位置和类型, 无法整合两者的错误信息 。 在本文中, 我们提议执行精细的翻译错误检测( FG- TED) 任务, 目的是在给定的源- 假冒句配对上确定翻译错误的位置和类型。 此外, 我们建立一个 FG- TED 模型, 以预测 \ textbf{ addition} 和\ textbf{ omission} 错误 -- -- 两个典型的翻译准确性错误。 首先, 我们使用单词级分类模式来形成我们的模型, 并使用快捷键学习减少来减轻单语功能的影响 。 此外, 我们为模型培训构建合成数据集, 并减少在权威数据集中进行数据标签的分歧, 使实验基准一致。 实验显示, 我们的模型可以同时识别错误类型和位置, 并在恢复的数据集上给出最新的结果 。 我们的模型还将提供比现有基线更可靠的低资源和传输情景预测 。 。