This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks, including question answering, natural language inference, word sense disambiguation, coreference resolution, and reasoning. [Method] Instead of arbitrarily increasing the size of a pretrained language model (PLM), our aim is to 1) fully extract knowledge from the input pretraining data given a certain parameter budget, e.g., 6B, and 2) effectively transfer this knowledge to downstream tasks. To achieve goal 1), we propose self-evolution learning for PLMs to wisely predict the informative tokens that should be masked, and supervise the masked language modeling (MLM) process with rectified smooth labels. For goal 2), we leverage the prompt transfer technique to improve the low-resource tasks by transferring the knowledge from the foundation model and related downstream tasks to the target task. [Results] According to our submission record (Oct. 2022), with our optimized pretraining and fine-tuning strategies, our 6B Vega method achieved new state-of-the-art performance on 4/8 tasks, sitting atop the SuperGLUE leaderboard on Oct. 8, 2022, with an average score of 91.3.
翻译:本技术报告简要描述了我们的JDExplore d-team的 Vega v2 在超级GLUE领导板上提交的我们JDExplore d-team的 Vega v2 。 SuperGLUE比广泛使用的通用语言理解评价(GLUE)基准更具挑战性,该基准包含八种难懂的语言任务,包括问答、自然语言推断、感知模糊、参照解析和推理。 [方法]我们的目标是根据某种参数预算,例如6B和2,从输入前培训数据中充分提取知识。为了实现目标1,我们建议为PLMS进行自我革命学习,以便明智地预测应当遮掩的信息符号,并监督蒙蔽的语言建模(MLMM)进程,同时纠正光滑标签。 关于目标2,我们利用迅速的转移技术来改进低资源任务,将基础模型和相关下游任务的知识转让给目标任务。 [Results]根据我们的提交文件记录(Oct. 2022),根据我们的平均标准第6G级第4级战略,以优化方式调整了我们第6B级标准前的进度。