关于语言模式培训前的双向作用 (On the Role of Bidirectionality in Language Model Pre-Training)

Prior work on language model pre-training has explored different architectures and learning objectives, but differences in data, hyperparameters and evaluation make a principled comparison difficult. In this work, we focus on bidirectionality as a key factor that differentiates existing approaches, and present a comprehensive study of its role in next token prediction, text infilling, zero-shot priming and fine-tuning. We propose a new framework that generalizes prior approaches, including fully unidirectional models like GPT, fully bidirectional models like BERT, and hybrid models like CM3 and prefix LM. Our framework distinguishes between two notions of bidirectionality (bidirectional context and bidirectional attention) and allows us to control each of them separately. We find that the optimal configuration is largely application-dependent (e.g., bidirectional attention is beneficial for fine-tuning and infilling, but harmful for next token prediction and zero-shot priming). We train models with up to 6.7B parameters, and find differences to remain consistent at scale. While prior work on scaling has focused on left-to-right autoregressive models, our results suggest that this approach comes with some trade-offs, and it might be worthwhile to develop very large bidirectional models.

翻译：语言模式培训前的先前工作探索了不同的结构和学习目标,但数据、超参数和评价方面的差异使原则性比较变得困难。在这项工作中,我们注重双向性,将其作为区别现有方法的一个关键因素,并全面研究其在下一个象征性预测、文本填充、零射线和微调中的作用。我们提议了一个新框架,将先前方法,包括完全单向模式(如GPT)、完全双向模型(如BERT)、完全双向模型(如BERT)和混合模型(如CM3和前置LM)加以概括。我们的框架区分双向性(双向背景和双向关注)这两个概念,并使我们能够分别控制其中每一个概念。我们发现,最佳配置在很大程度上取决于应用预测(例如,双向关注有利于微调和填充,但不利于下一种象征性预测和零射线)。我们训练了高达6.7B参数的模型,并发现差异在规模上保持不变。我们以前关于按比例缩放的两种概念(双向背景和双向关注)两个概念,而我们以前的工作则注重于左向的大幅调整模式。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日