Recent advances in automatic evaluation metrics for text have shown that deep contextualized word representations, such as those generated by BERT encoders, are helpful for designing metrics that correlate well with human judgements. At the same time, it has been argued that contextualized word representations exhibit sub-optimal statistical properties for encoding the true similarity between words or sentences. In this paper, we present two techniques for improving encoding representations for similarity metrics: a batch-mean centering strategy that improves statistical properties; and a computationally efficient tempered Word Mover Distance, for better fusion of the information in the contextualized word representations. We conduct numerical experiments that demonstrate the robustness of our techniques, reporting results over various BERT-backbone learned metrics and achieving state of the art correlation with human ratings on several benchmarks.
翻译:文本自动评价指标的最近进展表明,诸如BERT编码器生成的深度背景化字表,有助于设计与人类判断密切相关的度量。与此同时,还有人认为,背景化字表在编码词或句之间的真正相似性方面表现出亚优的统计特性。在本文件中,我们提出了两种方法来改进相似度指标的编码表示:分批法中心化战略,改善统计属性;以及计算效率高的温柔的文字移动距离,以更好地将信息纳入背景化字表。我们进行数字实验,以显示我们技术的稳健性,报告各种布尔特-背骨所学的度,并实现与若干基准的人类评级的艺术相关性。