Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data; and 2) evaluate, as the main methods and data sets focus on SL evaluation. To address the first issue, we present a document-aligned Japanese-English conversation corpus, including balanced, high-quality business conversation data for tuning and testing. As for the second issue, we manually identify the main areas where SL MT fails to produce adequate translations in lack of context. We then create an evaluation set where these phenomena are annotated to alleviate automatic evaluation of DL systems. We train MT models using our corpus to demonstrate how using context leads to improvements.
翻译:对于许多高资源语言来说,句级机器翻译(MT)的质量已经达到可接受的水平,但对于文件级机器翻译(DL)质量却无法达到可接受的质量,而文件级机器翻译(DLMT)则难以达到可接受的质量,因为1(1) 培训的DL数据很少;2) 评估,因为主要的方法和数据集侧重于SL评价。为了解决第一个问题,我们提交了一份符合文件要求的日文-英文谈话材料,包括用于调试和测试的平衡、高质量的商业对话数据。关于第二个问题,我们手工确定SLMT在缺乏背景的情况下未能产生适当翻译的主要领域。然后,我们创建了一套评估工具,说明这些现象,以减轻对DL系统的自动评价。我们用我们的文集培训MT模型,以展示使用环境如何导致改进。