In translating text where sentiment is the main message, human translators give particular attention to sentiment-carrying words. The reason is that an incorrect translation of such words would miss the fundamental aspect of the source text, i.e. the author's sentiment. In the online world, MT systems are extensively used to translate User-Generated Content (UGC) such as reviews, tweets, and social media posts, where the main message is often the author's positive or negative attitude towards the topic of the text. It is important in such scenarios to accurately measure how far an MT system can be a reliable real-life utility in transferring the correct affect message. This paper tackles an under-recognised problem in the field of machine translation evaluation which is judging to what extent automatic metrics concur with the gold standard of human evaluation for a correct translation of sentiment. We evaluate the efficacy of conventional quality metrics in spotting a mistranslation of sentiment, especially when it is the sole error in the MT output. We propose a numerical `sentiment-closeness' measure appropriate for assessing the accuracy of a translated affect message in UGC text by an MT system. We will show that incorporating this sentiment-aware measure can significantly enhance the correlation of some available quality metrics with the human judgement of an accurate translation of sentiment.
翻译:在翻译文稿时,情绪是主要信息,翻译者特别注意感化文字,翻译者特别注意感化文字,其原因是,不正确地翻译这些文字会错失原始文字的基本方面,即作者的情绪。在网上世界,广泛使用MT系统翻译用户发音内容(UGC),例如评论、推文和社交媒体文章,主要信息往往是作者对文稿主题的正面或负面态度。在这种情景中,必须准确衡量MT系统在传输正确影响信息方面能在多大程度上成为可靠的现实生活效用。本文解决了机器翻译评价领域一个未充分认识到的问题。在机器翻译评价领域,MT系统正在判断自动计量在多大程度上符合正确翻译情绪的人类评价的黄金标准。我们评估传统质量指标在发现对情绪的错误时的功效,特别是当它是MT输出的唯一错误时。我们提议了一个数字“感知即知”计量标准,以适当评估翻译影响正确信息的准确性。本文解决机器翻译领域一个未充分认识的问题,因为机器翻译领域的自动计量标准在多大程度上与正确翻译人性评价标准一致。我们可以通过提高人文感官感官的准确度,显示某种感官感官的判断系统,我们能够提高人感官感官的准确性判断。