Although text style transfer has witnessed rapid development in recent years, there is as yet no established standard for evaluation, which is performed using several automatic metrics, lacking the possibility of always resorting to human judgement. We focus on the task of formality transfer, and on the three aspects that are usually evaluated: style strength, content preservation, and fluency. To cast light on how such aspects are assessed by common and new metrics, we run a human-based evaluation and perform a rich correlation analysis. We are then able to offer some recommendations on the use of such metrics in formality transfer, also with an eye to their generalisability (or not) to related tasks.
翻译:尽管近年来文本风格的转换发展迅速,但迄今还没有采用若干自动衡量标准进行评价的既定标准,这种评价标准缺乏始终诉诸人类判断的可能性。我们侧重于形式转移的任务,以及通常被评估的三个方面:风格强度、内容保存和流畅。为了说明这些方面如何通过通用和新的衡量标准来评估,我们进行了基于人的评估和丰富的相关分析。然后,我们可以就这种标准在形式转移中的使用提出一些建议,同时也注意其是否适用于相关任务。