We define a novel concept called extended word alignment in order to improve post-editing assistance efficiency. Based on extended word alignment, we further propose a novel task called refined word-level QE that outputs refined tags and word-level correspondences. Compared to original word-level QE, the new task is able to directly point out editing operations, thus improves efficiency. To extract extended word alignment, we adopt a supervised method based on mBERT. To solve refined word-level QE, we firstly predict original QE tags by training a regression model for sequence tagging based on mBERT and XLM-R. Then, we refine original word tags with extended word alignment. In addition, we extract source-gap correspondences, meanwhile, obtaining gap tags. Experiments on two language pairs show the feasibility of our method and give us inspirations for further improvement.
翻译:我们定义了一个名为“扩展字对齐”的新概念,目的是提高编辑后的援助效率。基于“扩展字对齐”,我们进一步提议了一项名为“精细字级QE”的新任务,即输出精细字级标签和字级对应物。与原始字级QE相比,新任务能够直接指出编辑操作,从而提高效率。为了提取扩展字对齐,我们采用了基于 mBERT 的监管方法。为了解决精细字级QE,我们首先通过培训一个基于 mBERT 和 XLM-R 的顺序标记回归模型来预测原始QE标记。然后,我们用扩展字对齐来改进原字级标记。此外,我们同时提取源式加码通信,获得差距标记。对两种语言的实验显示了我们方法的可行性,并激励我们进一步改进。