Missing information is a common issue of dialogue summarization where some information in the reference summaries is not covered in the generated summaries. To address this issue, we propose to utilize natural language inference (NLI) models to improve coverage while avoiding introducing factual inconsistencies. Specifically, we use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered, as well as to distinguish between factually consistent and inconsistent generated sentences. Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach in balancing coverage and faithfulness, validated with automatic metrics and human evaluations. Additionally, we compute the correlation between commonly used automatic metrics with human judgments in terms of three different dimensions regarding coverage and factual consistency to provide insight into the most suitable metric for evaluating dialogue summaries.
翻译:为了解决这一问题,我们提议利用自然语言推论模型来改进覆盖面,同时避免出现事实上的不一致;具体地说,我们使用自然语言推论模型来计算细微的训练信号,以鼓励模型在参考摘要中产生尚未涵盖的内容,并区分事实上一致和前后不一致的句子。关于 DialogSum和SAMSum数据集的实验证实了拟议方法在平衡覆盖面和忠诚方面的有效性,该方法经过自动计量和人文评估的验证。此外,我们从覆盖面和事实一致性三个不同方面计算常用自动计量与人文判断之间的相互关系,以便深入了解评价对话摘要的最适当衡量标准。