Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicate and apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks. In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation. Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using part-of-speech (POS) to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages. In addition, we also present some novel work on quality estimation of MT without using reference translations including the usage of probability models of Na\"ive Bayes (NB), support vector machine (SVM) classification algorithms, and CRFs.
翻译:机械翻译(MT)是自然语言处理(NLP)文献中最热门的研究课题之一。在自然语言处理(NLP)文献中,机器翻译(MT)是作为最热门的研究课题之一而开发的。在MT文献中,一个重要问题是如何合理地评估MT系统,并告诉我们翻译系统是否有改进。传统的手工判断方法昂贵、耗时、不可复制,有时是协议程度低。另一方面,流行的自动MT评价方法有一些弱点。首先,它们往往在语言配对中表现良好,英语作为目标语言,但当使用英语作为资料来源时则比较弱。第二,有些方法依靠许多其他语言特征来达到良好的性能,这使得衡量标准无法复制和适用于其他语言配对。第三,一些流行指标使用不全面性因素,导致一些实用性的工作表现。 在本文中,我们设计新的MT评估方法, 包括升级的 RBMS, 我们的外部性能评估方法可以显示我们最新的语言翻译方法。