Since the 1950s, machine translation (MT) has become one of the important tasks of AI and development, and has experienced several different periods and stages of development, including rule-based methods, statistical methods, and recently proposed neural network-based learning methods. Accompanying these staged leaps is the evaluation research and development of MT, especially the important role of evaluation methods in statistical translation and neural translation research. The evaluation task of MT is not only to evaluate the quality of machine translation, but also to give timely feedback to machine translation researchers on the problems existing in machine translation itself, how to improve and how to optimise. In some practical application fields, such as in the absence of reference translations, the quality estimation of machine translation plays an important role as an indicator to reveal the credibility of automatically translated target languages. This report mainly includes the following contents: a brief history of machine translation evaluation (MTE), the classification of research methods on MTE, and the the cutting-edge progress, including human evaluation, automatic evaluation, and evaluation of evaluation methods (meta-evaluation). Manual evaluation and automatic evaluation include reference-translation based and reference-translation independent participation; automatic evaluation methods include traditional n-gram string matching, models applying syntax and semantics, and deep learning models; evaluation of evaluation methods includes estimating the credibility of human evaluations, the reliability of the automatic evaluation, the reliability of the test set, etc. Advances in cutting-edge evaluation methods include task-based evaluation, using pre-trained language models based on big data, and lightweight optimisation models using distillation techniques.
翻译:自1950年代以来,机器翻译(MT)已成为AI和开发的重要任务之一,并经历了若干不同的发展时期和阶段,包括基于规则的方法、统计方法以及最近提出的基于神经网络的学习方法。在这些阶段的飞跃的同时,对MT的评价研究和开发,特别是评价方法在统计翻译和神经翻译研究中的重要作用。MT的评价工作不仅是为了评价机器翻译的质量,而且为了及时向机器翻译研究人员反馈机器翻译本身存在的问题、如何改进和如何优化。在一些实际应用领域,例如没有参考翻译,机器翻译的质量估计作为显示自动翻译目标语言可信度的指标发挥着重要作用。该报告主要包括以下内容:机器翻译评价的简史(MTE)、MTE研究方法的分类,以及包括人文评价、自动评价和评价方法的尖端进展(元评价)。在使用参考翻译模型和参考翻译方法的准确性评估中,采用深度评估的可靠程度和独立参与;自动评估方法包括:机器翻译评价的简要历史测试方法,包括使用基于人的评价的高级评估、基于评估方法的高级评价的高级评价模式和高级评估;在使用深度评估中的高级评估中采用高级评估的可靠性和高级评估方法。