Current metrics for evaluating factuality for abstractive document summarization have achieved high correlations with human judgment, but they do not account for the vision modality and thus are not adequate for vision-and-language summarization. We propose CLIPBERTScore, a simple weighted combination of CLIPScore and BERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary, respectively. Next, due to the lack of meta-evaluation benchmarks to evaluate the quality of multimodal factuality metrics, we collect human judgments of factuality with respect to documents and images. We show that this simple combination of two metrics in the zero-shot setting achieves higher correlations than existing factuality metrics for document summarization, outperforms an existing multimodal summarization metric, and performs competitively with strong multimodal factuality metrics specifically fine-tuned for the task. Our thorough analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks. Finally, we demonstrate two practical downstream applications of our CLIPBERTScore metric: for selecting important images to focus on during training, and as a reward for reinforcement learning to improve factuality of multimodal summary generation w.r.t automatic and human evaluation. Our data and code are publicly available at https://github.com/meetdavidwan/faithful-multimodal-summ
翻译:用于评估抽象文件摘要的当前衡量标准与人类判断的相关性很高,但是这些衡量标准与人类判断的相关性并不高,但并不反映愿景模式,因此不足以进行愿景和语言总结。我们提议CLIPBERTScore,这是CLIPBERTScore和BERTScore的简单加权组合,可以分别利用图像摘要和文件摘要之间的稳健性和强强度事实检测性业绩。其次是由于缺乏评价多式联运数据质量质量的元评价基准,我们收集了人类对文件和图像的实度判断。我们表明,在零点设置中,两种简单的衡量标准组合比现有的文件总结事实质量指标具有更高的相关性,超越了现有的多式汇总指标,并且以具有竞争力的方式在图像总结和文件摘要和文件摘要之间进行具体调整。我们的全面分析表明,CLIPBERTScore及其组成部分在四个事实质量标准基准方面是稳健的,我们展示了CLIPBERTSCSTRCS-IDRIA/TSTRSIADRIDalalalalalalalal practalalalalal pressal intalupal intalupal intal intal intalupal intalupolololental intal intal intal legental legental legal legal intal intal legal legal intal legal intal legal legal legal legal legal intal intal lemental lemental lemental lemental lemental lemental legmental lemental lemental lemental lementalalalalalalalalalalalalal lementaldalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalalal lementalal