The rapid development of such natural language processing tasks as style transfer, paraphrase, and machine translation often calls for the use of semantic preservation metrics. In recent years a lot of methods to control the semantic similarity of two short texts were developed. This paper provides a comprehensive analysis for more than a dozen of such methods. Using a new dataset of fourteen thousand sentence pairs human-labeled according to their semantic similarity, we demonstrate that none of the metrics widely used in the literature is close enough to human judgment to be used on its own in these tasks. The recently proposed Word Mover's Distance (WMD), along with bilingual evaluation understudy (BLEU) and part-of-speech (POS) distance, seem to form a reasonable complex solution to measure semantic preservation in reformulated texts. We encourage the research community to use the ensemble of these metrics until a better solution is found.
翻译:自然语言处理任务的迅速发展,如风格传输、外句和机器翻译等,往往需要使用语义保存量度。近年来,制定了许多方法来控制两种简短文本的语义相似性。本文件对十多种这类方法进行了全面分析。使用根据语义相似性标记的14000对新数据集,我们证明文献中广泛使用的任何一种衡量标准都与人类判断不相上下,无法用于这些任务。最近提出的Word Moler的距离,加上双语评估基础研究(BEU)和部分语音(POS)距离,似乎形成了一种合理的复杂解决办法,用以衡量在重写文本中保存语义的长度。我们鼓励研究界在找到更好的解决办法之前使用这些衡量标准的组合。