In this work, we try to decipher the internal connection of NLP technology development in the past decades, searching for essence, which rewards us with a (potential) new learning paradigm for NLP tasks, dubbed as reStructured Pre-training (RST). In such a paradigm, the role of data will be re-emphasized, and model pre-training and fine-tuning of downstream tasks are viewed as a process of data storing and accessing. Based on that, we operationalize the simple principle that a good storage mechanism should not only have the ability to cache a large amount of data but also consider the ease of access. We achieve this by pre-training models over restructured data that consist of a variety of valuable information instead of raw data after overcoming several engineering challenges. Experimentally, RST models not only surpass strong competitors (e.g., T0) on 52/55 popular datasets from a variety of NLP tasks, but also achieve superior performance in National College Entrance Examination - English (Gaokao-English),the most authoritative examination in China. Specifically, the proposed system Qin achieves 40 points higher than the average scores made by students and 15 points higher than GPT3 with 1/16 parameters. In particular, Qin gets a high score of 138.5 (the full mark is 150) in the 2018 English exam (national paper III). We have released the Gaokao Benchmark with an online submission platform. In addition, we test our model in the 2022 College Entrance Examination English that happened a few days ago (2022.06.08), and it gets a total score of 134 (v.s. GPT3's 108).
翻译:在这项工作中,我们试图破译过去几十年中NLP技术开发的内部连接,寻找本质,这使我们得到一个(潜在的)新的NLP任务学习模式的奖励,称为“重组培训前”(RST ) 。在这样的模式中,数据的作用将得到重新强调,对下游任务的示范培训前和微调将被视为数据储存和存取过程。在此基础上,我们实施了一个简单的原则,即良好的存储机制不仅应该能够存储大量数据,而且还要考虑到访问的便利性。我们通过培训前的模型实现这一点,而调整的数据由各种有价值的信息组成,而不是在克服若干工程挑战之后的原始数据组成。实验性地,RST模型将不仅超过来自各种NLP任务中的强大的竞争者(例如,T0),而且将第52/55版流行数据集视为一个数据储存和存取过程的过程。基于这个简单的原则,即良好的存储机制不仅能够存储大量数据,而且能够考虑访问的便利。具体地说,在中国,提交考试之前的Qin系统将达到比GBLA标准高150分。