Neural Machine Translation (NMT) models have been effective on large bilingual datasets. However, the existing methods and techniques show that the model's performance is highly dependent on the number of examples in training data. For many languages, having such an amount of corpora is a far-fetched dream. Taking inspiration from monolingual speakers exploring new languages using bilingual dictionaries, we investigate the applicability of bilingual dictionaries for languages with extremely low, or no bilingual corpus. In this paper, we explore methods using bilingual dictionaries with an NMT model to improve translations for extremely low resource languages. We extend this work to multilingual systems, exhibiting zero-shot properties. We present a detailed analysis of the effects of the quality of dictionaries, training dataset size, language family, etc., on the translation quality. Results on multiple low-resource test languages show a clear advantage of our bilingual dictionary-based method over the baselines.
翻译:现有方法和技术显示,该模型的性能在很大程度上取决于培训数据中的实例数量。对于许多语言来说,拥有如此多的体积是一个极为牵强的梦想。从使用双语词典探索新语言的单一语言发言者的灵感中,我们研究了双语词典对极低或没有双语本体语言的适用性。在本文中,我们探索了使用双语词典和NMT模式改进极低资源语言翻译的方法。我们把这项工作扩大到多语言系统,展示零光属性。我们详细分析了词典质量、培训数据集大小、语言家庭等对翻译质量的影响。关于多种低资源测试语言的结果表明,我们双语词典方法在基线上具有明显优势。