Recent work like GPT-3 has demonstrated excellent performance of Zero-Shot and Few-Shot learning on many natural language processing (NLP) tasks by scaling up model size, dataset size and the amount of computation. However, training a model like GPT-3 requires huge amount of computational resources which makes it challengeable to researchers. In this work, we propose a method that incorporates large-scale distributed training performance into model architecture design. With this method, Yuan 1.0, the current largest singleton language model with 245B parameters, achieves excellent performance on thousands GPUs during training, and the state-of-the-art results on NLP tasks. A data processing method is designed to efficiently filter massive amount of raw data. The current largest high-quality Chinese corpus with 5TB high quality texts is built based on this method. In addition, a calibration and label expansion method is proposed to improve the Zero-Shot and Few-Shot performance, and steady improvement is observed on the accuracy of various tasks. Yuan 1.0 presents strong capacity of natural language generation, and the generated articles are difficult to distinguish from the human-written ones.
翻译:GPT-3等近期工作通过扩大模型规模、数据集大小和计算量,展示了许多自然语言处理(NLP)任务中的零热和少热学习的出色表现。然而,GPT-3等模型的培训需要大量的计算资源,使研究人员能够对此提出质疑。在这项工作中,我们提出了一种方法,将大规模分布式培训绩效纳入模型结构设计中。用这种方法,目前最大的单吨语言模式,有245B参数的元1.0,在培训期间对数千个通用语言进行了出色的表现,并取得了NLP任务的最新成果。设计了一个数据处理方法,以便有效地过滤大量原始数据。目前最大的高质量中国高品质的5TB高质量文本是建立在这一方法基础上的。此外,还提出了一种校准和标签扩展方法,以改进零热和少热的性能,并观察到各种任务的准确性能稳步改进。 ⁇ 1.0展示了天然语言生成的强大能力,所产生的文章很难与人写成的文字区分。