With the recent advance in neural machine translation demonstrating its importance, research on quality estimation (QE) has been steadily progressing. QE aims to automatically predict the quality of machine translation (MT) output without reference sentences. Despite its high utility in the real world, there remain several limitations concerning manual QE data creation: inevitably incurred non-trivial costs due to the need for translation experts, and issues with data scaling and language expansion. To tackle these limitations, we present QUAK, a Korean-English synthetic QE dataset generated in a fully automatic manner. This consists of three sub-QUAK datasets QUAK-M, QUAK-P, and QUAK-H, produced through three strategies that are relatively free from language constraints. Since each strategy requires no human effort, which facilitates scalability, we scale our data up to 1.58M for QUAK-P, H and 6.58M for QUAK-M. As an experiment, we quantitatively analyze word-level QE results in various ways while performing statistical analysis. Moreover, we show that datasets scaled in an efficient way also contribute to performance improvements by observing meaningful performance gains in QUAK-M, P when adding data up to 1.58M.
翻译:最近神经机翻译的进展表明其重要性,关于质量估计(QE)的研究正在稳步取得进展。QE旨在自动预测机器翻译(MT)产出的质量而无需参考句子。尽管在现实世界中,人工的QE数据创建存在一些限制:由于翻译专家的需要,不可避免地产生非三重成本,数据缩放和语言扩展问题。为了克服这些限制,我们提出了韩国-英语合成QE数据集QUAK,这是一个韩国-英语合成QE数据集,以完全自动的方式生成。它包括三个子QUAK数据集QUAK-M、QUAK-P和QUAK-H,它们都是通过三个相对不受语言限制的战略产生的。由于每项战略不需要人的努力,从而便于缩放,我们把我们的数据提高到1.58M,对于QUAK-P、H和QUA-M来说,我们量化地分析字级QE结果,同时进行统计分析。此外,我们通过进行有效的业绩分析,也显示业绩的提升到数据质量。