有效语言模型预培训和下游适应:关于GLUE的案例研究 (Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE)

This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference. [Method] We investigate several effective strategies and choose their best combination setting as the training recipes. As for model structure, we employ the vanilla Transformer with disentangled attention as the basic block encoder. For self-supervised training, we employ the representative denoising objective (i.e., replaced token detection) in phase 1 and combine the contrastive objective (i.e., sentence embedding contrastive learning) with it in phase 2. During fine-tuning, several advanced techniques such as transductive fine-tuning, self-calibrated fine-tuning, and adversarial fine-tuning are adopted. [Results] According to our submission record (Jan. 2022), with our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3. Encouragingly, our Vega v1 is the first to exceed powerful human performance on the two challenging tasks, i.e., SST-2 and WNLI. We believe our empirically successful recipe with a bag of tricks could shed new light on developing efficient discriminative large language models.

翻译：本技术报告简要描述了我们的JDExplore d-team在通用语言理解评价(GLUE)领导板上提交的Vega v1号文件,GLUE收集了9项自然语言理解任务,包括问题回答、语言可接受性、情绪分析、文本相似性、参数探测和自然语言推断。[Method]我们调查了若干有效的战略,并选择了最佳组合环境作为培训食谱。关于模型结构,我们用香草变形器作为基本块编码器。为了进行自我监督的培训,我们在第一阶段采用代表淡化目标(即替代代号探测),并将对比性目标(即插入对比性学习)与第二阶段结合起来。在微调期间,我们采用了一些先进技术,如感性微调、自我校准的微调,以及对抗性微调。 [Results]根据我们的提交记录(Jan. 2022),我们优化的预演练和微调整性标值(即代号)目标(即,即,我们1.3亿个最强、最强、最强的SLI3级的硬性任务)和最强的S-9级的硬性硬性模型,可以实现我们最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强的硬性、最强的硬性、最强的SLI3级、最强的硬性、最强、最强的硬性、最强、最强、最强的硬性、最强、最强性、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强、最强