文本摘要模型培训动态 (Training Dynamics for Text Summarization Models)

Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training models or how content selection and generation strategies are learnt across iterations. In this work, we analyze the training dynamics for generation models, focusing on news summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process. We find that properties such as copy behavior are learnt earlier in the training process and these observations are robust across domains. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, and this behavior is more varied across domains. Based on these observations, we explore complementary approaches for modifying training: first, disregarding high-loss tokens that are challenging to learn and second, disregarding low-loss tokens that are learnt very quickly. This simple training modification allows us to configure our model to achieve different goals, such as improving factuality or improving abstractiveness.

翻译：培训前语言模型(如BART)在对大型汇总数据集进行微调时显示了令人印象深刻的成果。但是,对于这一微调过程,我们很少了解什么是培训前模型保留的知识,或者如何通过迭代学习内容选择和生成战略。在这项工作中,我们分析了生成模型的培训动态,侧重于新闻总结。在不同的数据集(CNN/DM、XSum、MediaSum)和摘要属性(如抽象和幻觉)中,我们研究了模型在微调过程的不同阶段学到了什么。我们发现,复制行为等属性在培训过程中早期就学到了,这些观测在各个领域都很健全。另一方面,事实错误,如对无支持事实的幻觉,在后期学习,而这种行为在不同的领域更加不同。根据这些观察,我们探索了修改培训的互补方法:首先,忽略了难于学习的高损失符号,其次,而其次,忽略了快速学习的低损失符号。这一简单培训修改使我们得以配置模型以达到不同的目标,作为改进事实。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/