改进印度语多语言神经机器翻译系统 (Improving Multilingual Neural Machine Translation System for Indic Languages)

Machine Translation System (MTS) serves as an effective tool for communication by translating text or speech from one language to another language. The need of an efficient translation system becomes obvious in a large multilingual environment like India, where English and a set of Indian Languages (ILs) are officially used. In contrast with English, ILs are still entreated as low-resource languages due to unavailability of corpora. In order to address such asymmetric nature, multilingual neural machine translation (MNMT) system evolves as an ideal approach in this direction. In this paper, we propose a MNMT system to address the issues related to low-resource language translation. Our model comprises of two MNMT systems i.e. for English-Indic (one-to-many) and the other for Indic-English (many-to-one) with a shared encoder-decoder containing 15 language pairs (30 translation directions). Since most of IL pairs have scanty amount of parallel corpora, not sufficient for training any machine translation model. We explore various augmentation strategies to improve overall translation quality through the proposed model. A state-of-the-art transformer architecture is used to realize the proposed model. Trials over a good amount of data reveal its superiority over the conventional models. In addition, the paper addresses the use of language relationships (in terms of dialect, script, etc.), particularly about the role of high-resource languages of the same family in boosting the performance of low-resource languages. Moreover, the experimental results also show the advantage of backtranslation and domain adaptation for ILs to enhance the translation quality of both source and target languages. Using all these key approaches, our proposed model emerges to be more efficient than the baseline model in terms of evaluation metrics i.e BLEU (BiLingual Evaluation Understudy) score for a set of ILs.

翻译：机器翻译系统(MTS)是一个有效的沟通工具,将文本或语言从一种语言翻译到另一种语言。在印度这样的大型多语言环境中,高效翻译系统的需求变得显而易见,因为印度正式使用英语和一套印度语言(ILs),与英语相比,ILs仍然被作为低资源语言处理,因为没有Corbora。为了解决这种不对称性质,多语种神经机器翻译系统(MNMT)是朝着这个方向发展的一个理想的方法。在本文中,我们建议建立一个MNMT系统,以解决与低资源语言翻译有关的问题。我们的模式包括两个MNMT系统,即英语-Indica语(一对一)和印度语(一对一)两种语言。与英语(多语种)相比,IL(30个翻译方向)仍然被作为低资源解码语言处理。由于大多数IL对配对系统都缺乏平行的样本数量,不足以培训任何机器翻译模式。我们探索各种增强战略,以便通过提议的模型改进整体翻译质量。我们的两个MNMTMT(一至高语言版本)的关键语言翻译系统,在Servialalalal-lievalal lade real lade real real real real real lader lade lade lade lade lacuvalde laveal lade lauts lauts lauts lade lade lade lauts lax lauts lauts lade lade lade lade lade lade lauts a lauts lautes lauts lauts lauts lauts lauts lauts lautsal lauts lauts lauts lauts lauts lauts lauts lautsal lauts lauts lauts lauts a lauts lauts lauts lauts a la laut lauts a lauts lauts a lauts a lauts a la la lauts a la la la la

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日