使用多个小字来提高英语-埃斯佩兰托语自动化文学翻译质量 (Using Multiple Subwords to Improve English-Esperanto Automated Literary Translation Quality)

Building Machine Translation (MT) systems for low-resource languages remains challenging. For many language pairs, parallel data are not widely available, and in such cases MT models do not achieve results comparable to those seen with high-resource languages. When data are scarce, it is of paramount importance to make optimal use of the limited material available. To that end, in this paper we propose employing the same parallel sentences multiple times, only changing the way the words are split each time. For this purpose we use several Byte Pair Encoding models, with various merge operations used in their configuration. In our experiments, we use this technique to expand the available data and improve an MT system involving a low-resource language pair, namely English-Esperanto. As an additional contribution, we made available a set of English-Esperanto parallel data in the literary domain.

翻译：建立低资源语言的机器翻译系统(MT)仍具有挑战性。对于许多语言对来说,平行数据并不广泛,在这种情况下,MT模型不能取得与高资源语言相似的结果。当数据稀缺时,最佳利用现有的有限材料至关重要。为此,我们建议在本文件中多次使用相同的平行句子,但只改变每次单词分割的方式。为此,我们使用数个Byte Pair Encoding模型,并使用各种组合操作。在我们的实验中,我们使用这一技术来扩大现有数据,并改进涉及低资源语言对的MT系统,即英语-埃斯佩兰托语。作为额外的贡献,我们在文学领域提供了一套英文-埃斯佩兰托语平行数据。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

2020年中国《知识图谱》行业研究报告，45页ppt

专知会员服务

240+阅读 · 2020年4月18日