The explosion of user-generated content (UGC)--e.g. social media posts, comments, and reviews--has motivated the development of NLP applications tailored to these types of informal texts. Prevalent among these applications have been sentiment analysis and machine translation (MT). Grounded in the observation that UGC features highly idiomatic, sentiment-charged language, we propose a decoder-side approach that incorporates automatic sentiment scoring into the MT candidate selection process. We train separate English and Spanish sentiment classifiers, then, using n-best candidates generated by a baseline MT model with beam search, select the candidate that minimizes the absolute difference between the sentiment score of the source sentence and that of the translation, and perform a human evaluation to assess the produced translations. Unlike previous work, we select this minimally divergent translation by considering the sentiment scores of the source sentence and translation on a continuous interval, rather than using e.g. binary classification, allowing for more fine-grained selection of translation candidates. The results of human evaluations show that, in comparison to the open-source MT baseline model on top of which our sentiment-based pipeline is built, our pipeline produces more accurate translations of colloquial, sentiment-heavy source texts.
翻译:用户生成内容(UGC)的爆炸性(UGC) -- -- 例如,社交媒体文章、评论和评论的爆炸性(UGC) -- -- 激发了针对这些类型的非正式文本专门设计的NLP应用程序的开发。这些应用程序中最受重视的是情绪分析和机器翻译(MT)。我们提出一个解码方办法,将自动情绪评分纳入MT候选人甄选过程。然后,我们用波音搜索基准MT模型生成的最佳候选人来培训英文和西班牙情绪分类人员,选择能够最大限度地减少源码句和翻译的绝对情绪分数差异的候选人,并进行人文评价以评估所制作的翻译。与以前的工作不同,我们选择这种最小差异的翻译,方法是考虑源码句和翻译的情绪分数连续间隔,而不是使用例如二进制分类,允许更精细地选择翻译候选人。人类评价的结果显示,与我们基于情感的输油管线的版本的顶端的开放源MT基线模型相比,我们制作了更精确的翻译。