PAGnol: 超大型法国创创型模型 (PAGnol: An Extra-Large French Generative Model)

Access to large pre-trained models of varied architectures, in many different languages, is central to the democratization of NLP. We introduce PAGnol, a collection of French GPT models. Using scaling laws, we efficiently train PAGnol-XL (1.5B parameters) with the same computational budget as CamemBERT, a model 13 times smaller. PAGnol-XL is the largest model trained to date for the French language. We plan to train increasingly large and performing versions of PAGnol, exploring the capabilities of French extreme-scale models. For this first release, we focus on the pre-training and scaling calculations underlining PAGnol. We fit a scaling law for compute for the French language, and compare it with its English counterpart. We find the pre-training dataset significantly conditions the quality of the outputs, with common datasets such as OSCAR leading to low-quality offensive text. We evaluate our models on discriminative and generative tasks in French, comparing to other state-of-the-art French and multilingual models, and reaching the state of the art in the abstract summarization task. Our research was conducted on the public GENCI Jean Zay supercomputer, and our models up to the Large are made publicly available.

翻译：使用多种语言的大型各种建筑的预先培训模式,对于国家语言平台的民主化至关重要。我们引入了法国GPT模型集PAGnol。我们使用比例法,有效地培训PAGnol-XL(1.5B参数),其计算预算与CamemBERT(模型小13倍)相同。PAGnol-XL是迄今为止为法语语言培训的最大模型。我们计划培训日益庞大和出色的PAGnol版本,探索法国极端规模模型的能力。关于这首期,我们侧重于培训前和比例计算,强调PAGNol。我们用比例法设计了一个缩写法,并将其与英文对应方进行比较。我们发现,培训前的数据大大地满足了产出的质量,如OSCAR导致低质量攻击性文字的通用数据集。我们评估了我们法语中的歧视性和基因化任务模型,与其他法国和多语言模型进行比较,并达到抽象合成法语的艺术状态。我们所制作的GANGEN Genero的高级模型是公开进行的研究。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/