We introduce a new in-context learning paradigm to measure Large Language Models' (LLMs) ability to learn novel words during inference. In particular, we rewrite Winograd-style co-reference resolution problems by replacing the key concept word with a synthetic but plausible word that the model must understand to complete the task. Solving this task requires the model to make use of the dictionary definition of the new word given in the prompt. This benchmark addresses word acquisition, one important aspect of the diachronic degradation known to afflict LLMs. As LLMs are frozen in time at the moment they are trained, they are normally unable to reflect the way language changes over time. We show that the accuracy of LLMs compared to the original Winograd tasks decreases radically in our benchmark, thus identifying a limitation of current models and providing a benchmark to measure future improvements in LLMs ability to do in-context learning.
翻译:我们引入了新的内文学习范式,以衡量大语言模型(LLMs)在推论期间学习新词的能力,特别是,我们用该模型为完成任务而必须理解的合成但可信的词取代关键概念词,以此取代Winograd式的共同参考解析问题,解决这项任务要求模型使用快速给出的新词的字典定义,这个基准涉及单词获取,这是已知影响LLMs的地籍退化的一个重要方面。LLMs在培训时被冻结,通常无法反映语言随时间变化的方式。我们表明,相对于原Winograd任务而言,LMs在基准中的准确性急剧下降,从而确定了当前模式的局限性,并为衡量LMs今后在语言学习能力方面的改进提供了基准。