Pre-editing is the process of modifying the source text (ST) so that it can be translated by machine translation (MT) in a better quality. Despite the unpredictability of black-box neural MT (NMT), pre-editing has been deployed in various practical MT use cases. Although many studies have demonstrated the effectiveness of pre-editing methods for particular settings, thus far, a deep understanding of what pre-editing is and how it works for black-box NMT is lacking. To elicit such understanding, we extensively investigated human pre-editing practices. We first implemented a protocol to incrementally record the minimum edits for each ST and collected 6,652 instances of pre-editing across three translation directions, two MT systems, and four text domains. We then analysed the instances from three perspectives: the characteristics of the pre-edited ST, the diversity of pre-editing operations, and the impact of the pre-editing operations on NMT outputs. Our findings include the following: (1) enhancing the explicitness of the meaning of an ST and its syntactic structure is more important for obtaining better translations than making the ST shorter and simpler, and (2) although the impact of pre-editing on NMT is generally unpredictable, there are some tendencies of changes in the NMT outputs depending on the editing operation types.
翻译:尽管黑盒神经MT(NMT)的不可预测性,但是在各种实际的MT使用案例中,已经采用了预编辑。尽管许多研究表明了特定环境预编辑方法的有效性,但迄今为止,对黑盒NMT的预编辑方式是什么及其如何运作缺乏深刻的理解。为了获得这种理解,我们广泛调查了人类预编辑做法。我们首先执行了一项协议,逐步记录每项ST的最低编辑,并收集了6 652个编前编辑案例,涵盖三个翻译方向、两个MT系统以及四个文本领域。我们随后从三个角度分析了这些实例:预先编辑ST的特点、编前编辑操作的多样性以及编前操作对黑盒NMT产出的影响。我们的调查结果包括:(1) 提高ST及其综合结构的明确性,并收集了6 652个编前编辑案例,这通常比NMT的更简单化和不可预测性的翻译方式更为重要。