不要走远一点:关于神经诗歌翻译的经验研究 (Don't Go Far Off: An Empirical Study on Neural Poetry Translation)

Despite constant improvements in machine translation quality, automatic poetry translation remains a challenging problem due to the lack of open-sourced parallel poetic corpora, and to the intrinsic complexities involved in preserving the semantics, style, and figurative nature of poetry. We present an empirical investigation for poetry translation along several dimensions: 1) size and style of training data (poetic vs. non-poetic), including a zero-shot setup; 2) bilingual vs. multilingual learning; and 3) language-family-specific models vs. mixed-multilingual models. To accomplish this, we contribute a parallel dataset of poetry translations for several language pairs. Our results show that multilingual fine-tuning on poetic text significantly outperforms multilingual fine-tuning on non-poetic text that is 35X larger in size, both in terms of automatic metrics (BLEU, BERTScore) and human evaluation metrics such as faithfulness (meaning and poetic style). Moreover, multilingual fine-tuning on poetic data outperforms \emph{bilingual} fine-tuning on poetic data.

翻译：尽管机器翻译质量不断提高,但自动诗歌翻译仍然是一个具有挑战性的问题,原因是缺乏开放来源的平行诗人谱体,以及维护诗歌的语义、风格和比喻性质涉及的内在复杂性。我们从几个方面对诗歌翻译进行了实证调查:1)培训数据的规模和风格(诗歌与非诗歌),包括零弹式设置;2)双语与多语种学习;3)语言家庭模式与混合多语种模式。为此,我们为几对语言提供了一套平行的诗歌翻译数据集。我们的结果显示,对诗歌文本的多语种微调大大优于对非诗语文本的多语种微调,其尺寸为35x,在自动计量(语言U,BERCScore)和诸如忠诚(语言和诗歌风格)等人类评价指标方面,都大为35x。此外,多语种微调诗意数据。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日