This paper presents a summary of the findings that we obtained based on the shared task on machine translation of Dravidian languages. We stood first in three of the five sub-tasks which were assigned to us for the main shared task. We carried out neural machine translation for the following five language pairs: Kannada to Tamil, Kannada to Telugu, Kannada to Malayalam, Kannada to Sanskrit, and Kannada to Tulu. The datasets for each of the five language pairs were used to train various translation models, including Seq2Seq models such as LSTM, bidirectional LSTM, Conv2Seq, and training state-of-the-art as transformers from scratch, and fine-tuning already pre-trained models. For some models involving monolingual corpora, we implemented backtranslation as well. These models' accuracy was later tested with a part of the same dataset using BLEU score as an evaluation metric.
翻译:本文总结了我们根据对德拉维迪亚语言进行机器翻译的共同任务而获得的调查结果。 我们站在五个分任务中的三个分任务中排在第一位,我们为以下五对语言进行了神经机器翻译:Kannada至泰米尔语、Kannada至Telugu语、Kannada至Malayalam语、Kannada至Sanskrit语和Kannada至Tulu语。这五对语言的每对语言的数据集被用于培训各种翻译模型,包括Seq2Seq模式,如LSTM、双向LSTM、Conv2Seq等,以及将艺术状态培训成从零到零的变压器,以及微调已经受过训练的模型。对于一些涉及单一语言组合的模型,我们进行了反译。这些模型的精确性后来用一个数据集的一部分进行了测试,使用BLEU的评分作为评价指标。