Little research has been done on Neural Machine Translation (NMT) for Azerbaijani. In this paper, we benchmark the performance of Azerbaijani-English NMT systems on a range of techniques and datasets. We evaluate which segmentation techniques work best on Azerbaijani translation and benchmark the performance of Azerbaijani NMT models across several domains of text. Our results show that while Unigram segmentation improves NMT performance and Azerbaijani translation models scale better with dataset quality than quantity, cross-domain generalization remains a challenge
翻译:对阿塞拜疆人的神经机器翻译(NMT)的研究很少。在本文中,我们根据一系列技术和数据集衡量阿塞拜疆-英语国家机器翻译(NMT)系统的业绩。我们评估了阿塞拜疆翻译中哪种分解技术最有效,在若干文本领域衡量阿塞拜疆国家机器翻译模式的绩效。我们的结果表明,虽然Unigram分解提高了国家机器翻译(NMT)的性能和阿塞拜疆翻译模型的规模,使数据集的质量高于数量,但交叉通用仍然是一个挑战。