Machine translation has wide applications in daily life. In mission-critical applications such as translating official documents, incorrect translation can have unpleasant or sometimes catastrophic consequences. This motivates recent research on testing methodologies for machine translation systems. Existing methodologies mostly rely on metamorphic relations designed at the textual level (e.g., Levenshtein distance) or syntactic level (e.g., the distance between grammar structures) to determine the correctness of translation results. However, these metamorphic relations do not consider whether the original and translated sentences have the same meaning (i.e., Semantic similarity). Therefore, in this paper, we propose SemMT, an automatic testing approach for machine translation systems based on semantic similarity checking. SemMT applies round-trip translation and measures the semantic similarity between the original and translated sentences. Our insight is that the semantics expressed by the logic and numeric constraint in sentences can be captured using regular expressions (or deterministic finite automata) where efficient equivalence/similarity checking algorithms are available. Leveraging the insight, we propose three semantic similarity metrics and implement them in SemMT. The experiment result reveals SemMT can achieve higher effectiveness compared with state-of-the-art works, achieving an increase of 21% and 23% on accuracy and F-Score, respectively. We also explore potential improvements that can be achieved when proper combinations of metrics are adopted. Finally, we discuss a solution to locate the suspicious trip in round-trip translation, which may shed lights on further exploration.
翻译:机器翻译在日常生活中有着广泛的应用。 在翻译官方文件等任务关键应用中, 不正确的翻译可能带来不愉快或有时灾难性的后果。 这促使了最近对机器翻译系统的测试方法的研究。 现有的方法大多依赖于在文本级别( 如Levesthtein 距离) 或语法层次( 例如语法结构之间的距离) 设计的变化式关系, 以确定翻译结果的正确性。 但是, 这些变形关系并不考虑原判和译文是否具有相同的含义( 即语法相似性 ) 。 因此, 在本文中, 我们提议对机器翻译系统采用基于语法相似性检查的自动测试方法SemMT。 SemMT应用圆形翻译, 测量原判和译文之间的语义相似性关系( 例如语法结构之间的距离 ) 。 我们的逻辑和数字约束可以使用常规表达方式( 或确定性定式的定式自动自动表达式), 从而可以使用高效的等同/相似性算算法。 因此, 我们提议在精确的洞察中, 3个精度测试中, 将精确度校正的校正校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正法可以提高结果 。