Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several inter-sentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves state-of-the-art results across the board in both extractive and abstractive settings. Our code is available at https://github.com/nlpyang/PreSumm
翻译:来自变换器(BERT)的双向编码器代表了经过培训的语文模型的最新演化,这些模型最近推进了广泛的自然语言处理任务。在本文中,我们展示了如何将BERT用于文本总和中,并为提取和抽象模型提出了一个总体框架。我们采用了基于BERT的新型文档级编码器,它能够表达文件的语义,并获得对其判决的表述。我们的采掘模型建在这个编码器之上,它堆叠了若干个供述变换器层。关于抽象的合成,我们提出了一个新的微调时间表,为编码器和解码器采用不同的优化器,作为缓解两者不匹配的手段(前者经过预先培训,而后者则没有)。我们还表明,两阶段的微调方法可以进一步提高所生成摘要的质量。对三个数据集的实验表明,我们的模型在采掘和抽象的环境下都取得了全局性艺术成果。我们的代码可以在 httpsgimpregimm/Supregimput.