Machine translation (MT) has almost achieved human parity at sentence-level translation. In response, the MT community has, in part, shifted its focus to document-level translation. However, the development of document-level MT systems is hampered by the lack of parallel document corpora. This paper describes BWB, a large parallel corpus first introduced in Jiang et al. (2022), along with an annotated test set. The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena. Our resource is freely available, and we hope it will serve as a guide and inspiration for more work in document-level machine translation.
翻译:机器翻译(MT)在判决翻译方面几乎实现了人与人的平等,作为回应,MT社区部分地将其重点转向文件翻译,然而,由于缺乏平行文件公司,文件层面的MT系统的发展受到阻碍,本文描述了BWB,这是在江等人(2022年)首次引入的大型平行文件以及附加说明的测试集,BWB文集由专家翻译成英文的中文小说组成,附加说明的测试集旨在探索机器翻译系统模拟各种对话现象的能力。我们的资源是免费的,我们希望它将成为在文件层面机器翻译方面开展更多工作的指南和灵感。