Hallucination is a known issue for neural abstractive summarization models. Recent work suggests that the degree of hallucination may depend on errors in the training data. In this work, we propose a new method called Contrastive Parameter Ensembling (CaPE) to use training data more effectively, utilizing variations in noise in training samples to reduce hallucination. We first select clean and noisy subsets from the training data using different automatic factual metrics. Then, we fine-tune a base summarization model, which is trained on all training samples, on the clean (noisy) subset to obtain an \textit{expert} (\textit{anti-expert}) model. Finally, we adjust the parameters of base model by the difference between parameters of the \textit{expert} and \textit{anti-expert} models, steering the base model towards the \textit{expert} model and away from the \textit{anti-expert} model. Experimental results show that CaPE improves performance across different automatic factual metrics and human evaluation, with the maximum improvement of 16.69\% and 15.78\% on summary-level dependency-arc entailment accuracy for the XSUM and CNN/DM datasets. The improvement in factual performance does not degrade the performance on other metrics of informativeness such as ROUGE.
翻译:幻觉是神经抽象概括模型的一个已知问题。 最近的工作表明,幻觉的程度可能取决于培训数据中的错误。 在这项工作中,我们提议一种名为“反参数集合”的新方法,以便更有效地使用培训数据,利用培训样本中的噪音变化来减少幻觉。 我们首先使用不同的自动事实度量标准从培训数据中选择清洁和吵闹的子集。 然后,我们微调一个基础总结模型,该模型在所有培训样本、清洁( noisy)子集以获得一个\ textit{专家} (\textit{ant-expert})模型上接受培训。 最后,我们根据\textit{experty} 和\textit{ant-experit}模型之间的参数差异来调整基础模型参数参数的参数。 我们首先使用不同的自动事实度量度值模型来引导基础模型的清洁和噪音子集。 实验结果表明,CAPE改进了不同自动事实度量度指标和人类评估的性能,最大程度上改进了16.69}和15.78基础模型的参数参数参数参数,作为X69* 和SIMIS-DRM的精确度度的精确度的精确度。