Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets. Despite promising results, current models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. Several recent efforts attempt to address this by devising models that automatically detect factual inconsistencies in machine generated summaries. However, they focus exclusively on English, a language with abundant resources. In this work, we leverage factual consistency evaluation models to improve multilingual summarization. We explore two intuitive approaches to mitigate hallucinations based on the signal provided by a multilingual NLI model, namely data filtering and controlled generation. Experimental results in the 45 languages from the XLSum dataset show gains over strong baselines in both automatic and human evaluation.
翻译:近年来,由于经过预先培训的语言模型和大规模数据集的可用性,抽象总结在最近几年中再次引起人们的兴趣。尽管取得了令人乐观的成果,但目前的模型仍然在产生事实上不一致的摘要,减少了其对现实世界应用的效用。最近为解决这一问题,作出了一些努力,设计了自动发现机器生成的摘要中事实不一致的模型。然而,这些模型只侧重于英语,英语是一种资源丰富的语言。在这项工作中,我们利用事实一致性评价模型来改进多语种的组合。我们探索了两种直觉方法,根据多语种NLI模型提供的信号减轻幻觉,即数据过滤和控制生成。XLSum数据集45种语言的实验结果显示,自动和人文评估都取得了长足的基线。