Many recent studies on large-scale language models have reported successful in-context zero- and few-shot learning ability. However, the in-depth analysis of when in-context learning occurs is still lacking. For example, it is unknown how in-context learning performance changes as the training corpus varies. Here, we investigate the effects of the source and size of the pretraining corpus on in-context learning in HyperCLOVA, a Korean-centric GPT-3 model. From our in-depth investigation, we introduce the following observations: (1) in-context learning performance heavily depends on the corpus domain source, and the size of the pretraining corpus does not necessarily determine the emergence of in-context learning, (2) in-context learning ability can emerge when a language model is trained on a combination of multiple corpora, even when each corpus does not result in in-context learning on its own, (3) pretraining with a corpus related to a downstream task does not always guarantee the competitive in-context learning performance of the downstream task, especially in the few-shot setting, and (4) the relationship between language modeling (measured in perplexity) and in-context learning does not always correlate: e.g., low perplexity does not always imply high in-context few-shot learning performance.
翻译:最近许多关于大型语言模式的研究都报告说,在文本中学习能力是成功的,零和少见的学习能力是成功的,然而,仍然缺乏对何时进行理论内学习的深入分析,例如,由于培训材料不同,学习成绩的变化是难以预料的。在这里,我们调查了在HyperCLOVA(一个韩国人为中心的GPT-3模型)中,培训前的学习成绩对文中学习的影响和规模。从深入的调查来看,我们提出以下意见:(1) 文中学习成绩在很大程度上取决于物质领域来源,而培训前教材的规模不一定决定文字内学习的出现,(2) 当语言模式经过多种整体的训练时,即使每个材料本身不导致文中学习,也可以产生文中学习成绩的变化,(3) 与下游任务有关的文中学习的训练不一定保证下游任务具有竞争性的文中学习成绩,特别是在几张照片中,以及(4) 语言模型(测量的难解程度)和高学习成绩之间的关系并非始终具有低调。