调整长文本序列的预训练文字到文字模型 (Adapting Pretrained Text-to-Text Models for Long Text Sequences)

We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline -- model architecture, optimization objective, and pretraining corpus, we propose an effective recipe to build long-context models from existing short-context models. Specifically, we replace the full attention in transformers with pooling-augmented blockwise attention, and pretrain the model with a masked-span prediction task with spans of varying length. In terms of the pretraining corpus, we find that using randomly concatenated short-documents from a large open-domain corpus results in better performance than using existing long document corpora which are typically limited in their domain coverage. With these findings, we build a long-context model that achieves competitive performance on long-text QA tasks and establishes the new state of the art on five long-text summarization datasets, often outperforming previous methods with larger model sizes. Our code has been released at https://github.com/facebookresearch/bart_ls.

翻译：我们提出了一个经验性研究,以调整现有的经事先训练的文本到文本模型,用于长期序列输入。通过对培训前管道的三个轴线 -- -- 模型结构、优化目标和培训前材料进行综合研究,我们提出了一个从现有的短文本模型中建立长文本模型的有效配方。具体地说,我们用集中强化的分块关注来取代变压器中的全部注意力,用覆盖不同长度的覆盖面的蒙面的预测任务来预演模型。就培训前材料而言,我们发现使用大型开放地块的随机拼凑短文件的结果比利用现有长文件公司的业绩更好,而后者通常在它们的领域覆盖范围有限。有了这些发现,我们建立了一个长文本模型,在长文本QA任务上实现竞争性表现,并建立了五种长文本总称数据集的新状态,通常比以前具有较大模型规模的方法要好。我们的代码已在https://github.com/facebreadresearch/bart_ls上发布。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日