Neural Machine translation is a challenging task due to the inherent complex nature and the fluidity that natural languages bring. Nonetheless, in recent years, it has achieved state-of-the-art performance in several language pairs. Although, a lot of traction can be seen in the areas of multilingual neural machine translation (MNMT) in the recent years, there are no comprehensive survey done to identify what approaches work well. The goal of this project is to investigate the realm of low resource languages and build a Neural Machine Translation model to achieve state-of-the-art results. The project looks to build upon the \texttt{mBART.CC25} \cite{liu2020multilingual} language model and explore strategies to augment it with various NLP and Deep Learning techniques like back translation and transfer learning. This implementation tries to unpack the architecture of the NMT application and determine the different components which offers us opportunities to amend the said application within the purview of the low resource languages problem space.
翻译:神经机器翻译是一项具有挑战性的任务,这是由于自然语言所带来的内在复杂性和流动性。尽管最近几年在一些语言对中取得了最先进的性能,但是在多语种神经机器翻译(MNMT)领域中已经出现了很多关注,但仍缺乏全面的调查以确定哪些方法奏效。本项目的目标是研究低资源语言领域,构建一个神经机器翻译模型,以实现最先进的效果。该项目旨在建立在\texttt{mBART.CC25}\cite{liu2020multilingual}语言模型的基础上,探索使用后翻译和迁移学习等各种NLP和深度学习技术对其进行增强的策略。本实现尝试解包NMT应用的架构,并确定能为我们提供机会在低资源语言问题空间范围内修改所述应用程序的不同组件。