Cyrillic and Traditional Mongolian are the two main members of the Mongolian writing system. The Cyrillic-Traditional Mongolian Bidirectional Conversion (CTMBC) task includes two conversion processes, including Cyrillic Mongolian to Traditional Mongolian (C2T) and Traditional Mongolian to Cyrillic Mongolian conversions (T2C). Previous researchers adopted the traditional joint sequence model, since the CTMBC task is a natural Sequence-to-Sequence (Seq2Seq) modeling problem. Recent studies have shown that Recurrent Neural Network (RNN) and Self-attention (or Transformer) based encoder-decoder models have shown significant improvement in machine translation tasks between some major languages, such as Mandarin, English, French, etc. However, an open problem remains as to whether the CTMBC quality can be improved by utilizing the RNN and Transformer models. To answer this question, this paper investigates the utility of these two powerful techniques for CTMBC task combined with agglutinative characteristics of Mongolian language. We build the encoder-decoder based CTMBC model based on RNN and Transformer respectively and compare the different network configurations deeply. The experimental results show that both RNN and Transformer models outperform the traditional joint sequence model, where the Transformer achieves the best performance. Compared with the joint sequence baseline, the word error rate (WER) of the Transformer for C2T and T2C decreased by 5.72\% and 5.06\% respectively.
翻译:西里尔-传统蒙古双向转换(CTMBC)任务包括两个转换过程,包括西里尔蒙古语至传统蒙古语(C2T)和传统蒙古语至西里尔蒙古语(T2C) 。以前的研究人员采用了传统的联合序列模式,因为CTMBC任务是一个自然序列到序列(Seq2Seq)的模型问题。最近的研究表明,基于源代码(RNN)和自控(或变异器)的Necoder-decoder(CTMBC)模型显示,某些主要语言(如曼达林、英语、法语等)之间的机器翻译任务有重大改进。然而,一个未决问题是,利用RNN和变异模型能否提高CTMBC的质量。为了回答这个问题,本文研究了这两种强大的技术对CTMBC任务的作用,加上蒙古语的混杂特征。我们用基于 CTM-decoder模型的CTMBC 5-decoder模型和变异的变压模型,分别用RNNER 和变压模型和变变式模型,分别用RNER 和变式联合模型, 和变式模型分别用RNERRNRNBC2 格式, 和变式 和变压 和变式模型, 和变换制结果 和变式模型, 和变式 和变式模型,分别用结果 和变式 和变式 和变式 和变式 双变式模型,分别用。