Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics. In this work, we develop strong-performing automatic metrics for reference-based summarization evaluation, based on a two-stage evaluation pipeline that first extracts basic information units from one text sequence and then checks the extracted units in another sequence. The metrics we developed include two-stage metrics that can provide high interpretability at both the fine-grained unit level and summary level, and one-stage metrics that achieve a balance between efficiency and interoperability. We make the developed tools publicly available through a Python package and GitHub.
翻译:解释性和效率是采用神经自动计量的两个重要考虑因素。在这项工作中,我们根据一个两阶段评价管道,为基于参考的汇总评价制定完善的自动计量,该管道首先从一个文本序列中提取基本信息单位,然后按另一个顺序检查抽取的单位。我们开发的计量包括两阶段的计量,可以在精细单位和摘要级别上提供高可解释性,以及一阶段的计量,在效率和互操作性之间实现平衡。我们通过一个Python软件包和GitHub将开发的工具公诸于众。</s>