SEMMT:机械翻译系统基于语义的测试方法 (SemMT: A Semantic-based Testing Approach for Machine Translation Systems)

Machine translation has wide applications in daily life. In mission-critical applications such as translating official documents, incorrect translation can have unpleasant or sometimes catastrophic consequences. This motivates recent research on testing methodologies for machine translation systems. Existing methodologies mostly rely on metamorphic relations designed at the textual level (e.g., Levenshtein distance) or syntactic level (e.g., the distance between grammar structures) to determine the correctness of translation results. However, these metamorphic relations do not consider whether the original and translated sentences have the same meaning (i.e., Semantic similarity). Therefore, in this paper, we propose SemMT, an automatic testing approach for machine translation systems based on semantic similarity checking. SemMT applies round-trip translation and measures the semantic similarity between the original and translated sentences. Our insight is that the semantics expressed by the logic and numeric constraint in sentences can be captured using regular expressions (or deterministic finite automata) where efficient equivalence/similarity checking algorithms are available. Leveraging the insight, we propose three semantic similarity metrics and implement them in SemMT. The experiment result reveals SemMT can achieve higher effectiveness compared with state-of-the-art works, achieving an increase of 21% and 23% on accuracy and F-Score, respectively. We also explore potential improvements that can be achieved when proper combinations of metrics are adopted. Finally, we discuss a solution to locate the suspicious trip in round-trip translation, which may shed lights on further exploration.

翻译：机器翻译在日常生活中有着广泛的应用。在翻译官方文件等任务关键应用中, 不正确的翻译可能带来不愉快或有时灾难性的后果。这促使了最近对机器翻译系统的测试方法的研究。现有的方法大多依赖于在文本级别( 如Levesthtein 距离) 或语法层次( 例如语法结构之间的距离) 设计的变化式关系, 以确定翻译结果的正确性。但是, 这些变形关系并不考虑原判和译文是否具有相同的含义( 即语法相似性 ) 。因此, 在本文中, 我们提议对机器翻译系统采用基于语法相似性检查的自动测试方法SemMT。 SemMT应用圆形翻译, 测量原判和译文之间的语义相似性关系( 例如语法结构之间的距离 ) 。我们的逻辑和数字约束可以使用常规表达方式( 或确定性定式的定式自动自动表达式), 从而可以使用高效的等同/相似性算算法。因此, 我们提议在精确的洞察中, 3个精度测试中, 将精确度校正的校正校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正法可以提高结果。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日