We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small language models ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough language models can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.
翻译:我们研究语言模型中的文文本学习(ICL)如何受到语义前缀和输入标签映射的影响。 我们调查了两种语义学习(ICL)与输入标签对比。 我们调查了两种语义学习(ICL)与翻版标签(GPT-3、SportGPT、Scodex、PALM和Flan-PALM)不同模式组(GPT-3、SToltGPT、Scodex、Pal-PALM和Flan-PALM)相交的语义学习(IC ) 。 首先, 使用翻版标签的语义学实验表明, 超越语义前缀是模型, 忽略了文本中的翻版标签标签, 从而迫使语言模型在预培训中的语义前缀前缀前缀之前取代语义前缀, 显示的语义前代LLA型, 还要用SULL 格式的原文级指令, 学习SU-LL 级前序, 学习SL 级的原文级指令, 学习前L 级 级的原文级 级指令。</s>