Despite recent progress in abstractive summarization, systems still suffer from faithfulness errors. While prior work has proposed models that improve faithfulness, it is unclear whether the improvement comes from an increased level of extractiveness of the model outputs as one naive way to improve faithfulness is to make summarization models more extractive. In this work, we present a framework for evaluating the effective faithfulness of summarization systems, by generating a faithfulnessabstractiveness trade-off curve that serves as a control at different operating points on the abstractiveness spectrum. We then show that the Maximum Likelihood Estimation (MLE) baseline as well as a recently proposed method for improving faithfulness, are both worse than the control at the same level of abstractiveness. Finally, we learn a selector to identify the most faithful and abstractive summary for a given document, and show that this system can attain higher faithfulness scores in human evaluations while being more abstractive than the baseline system on two datasets. Moreover, we show that our system is able to achieve a better faithfulness-abstractiveness trade-off than the control at the same level of abstractiveness.
翻译:尽管在抽象总结方面最近有所进步,但系统仍然受到忠诚错误的影响。虽然先前的工作提出了提高忠诚程度的模式,但尚不清楚改进是否来自模型输出的更高采掘程度,因为提高忠诚程度的一个天真的方法是使概括模型更具采掘性。在这项工作中,我们提出了一个框架,用以评价总结系统的有效忠诚性,方法是形成一个忠实性测试交换曲线,作为对抽象程度范围不同操作点的控制。我们随后表明,最大相似性估计基线以及最近提出的改进忠诚的方法,都比在同一抽象程度的控制更差。最后,我们学会了选择者,为某一文件确定最忠实和抽象的摘要,并表明这一系统在人类评价中可以取得更高的忠诚分数,同时比两个数据集的基准系统更抽象。此外,我们表明,我们的系统能够实现比同一程度的抽象控制更好的忠实性-抑制性交易。