Despite constant improvements in machine translation quality, automatic poetry translation remains a challenging problem due to the lack of open-sourced parallel poetic corpora, and to the intrinsic complexities involved in preserving the semantics, style, and figurative nature of poetry. We present an empirical investigation for poetry translation along several dimensions: 1) size and style of training data (poetic vs. non-poetic), including a zero-shot setup; 2) bilingual vs. multilingual learning; and 3) language-family-specific models vs. mixed-multilingual models. To accomplish this, we contribute a parallel dataset of poetry translations for several language pairs. Our results show that multilingual fine-tuning on poetic text significantly outperforms multilingual fine-tuning on non-poetic text that is 35X larger in size, both in terms of automatic metrics (BLEU, BERTScore) and human evaluation metrics such as faithfulness (meaning and poetic style). Moreover, multilingual fine-tuning on poetic data outperforms \emph{bilingual} fine-tuning on poetic data.
翻译:尽管机器翻译质量不断提高,但自动诗歌翻译仍然是一个具有挑战性的问题,原因是缺乏开放来源的平行诗人谱体,以及维护诗歌的语义、风格和比喻性质涉及的内在复杂性。我们从几个方面对诗歌翻译进行了实证调查:1)培训数据的规模和风格(诗歌与非诗歌),包括零弹式设置;2)双语与多语种学习;3)语言家庭模式与混合多语种模式。为此,我们为几对语言提供了一套平行的诗歌翻译数据集。我们的结果显示,对诗歌文本的多语种微调大大优于对非诗语文本的多语种微调,其尺寸为35x,在自动计量(语言U,BERCScore)和诸如忠诚(语言和诗歌风格)等人类评价指标方面,都大为35x。此外,多语种微调诗意数据。