While the field of style transfer (ST) has been growing rapidly, it has been hampered by a lack of standardized practices for automatic evaluation. In this paper, we evaluate leading ST automatic metrics on the oft-researched task of formality style transfer. Unlike previous evaluations, which focus solely on English, we expand our focus to Brazilian-Portuguese, French, and Italian, making this work the first multilingual evaluation of metrics in ST. We outline best practices for automatic evaluation in (formality) style transfer and identify several models that correlate well with human judgments and are robust across languages. We hope that this work will help accelerate development in ST, where human evaluation is often challenging to collect.
翻译:虽然时装转让(ST)领域增长迅速,但由于缺乏自动评价的标准化做法而受阻。在本文件中,我们评价了在形式转让这一经常研究任务方面领先的ST自动衡量标准。与以前只注重英语的评价不同,我们把重点扩大到巴西-葡萄牙、法国和意大利,使这项工作成为对ST中各种衡量标准的第一次多语种评价。我们概述了(形式)时装自动评价的最佳做法,并确定了与人类判断密切相关和各种语言都健全的若干模式。我们希望这项工作将有助于加速ST的发展,因为那里的人文评价往往难以收集。