This paper examines the encoding of analogy in large-scale pretrained language models, such as BERT and GPT-2. Existing analogy datasets typically focus on a limited set of analogical relations, with a high similarity of the two domains between which the analogy holds. As a more realistic setup, we introduce the Scientific and Creative Analogy dataset (SCAN), a novel analogy dataset containing systematic mappings of multiple attributes and relational structures across dissimilar domains. Using this dataset, we test the analogical reasoning capabilities of several widely-used pretrained language models (LMs). We find that state-of-the-art LMs achieve low performance on these complex analogy tasks, highlighting the challenges still posed by analogy understanding.
翻译:本文审查了大规模预先培训语言模型(如BERT和GPT-2)中的类比编码。现有的类比数据集通常侧重于有限的一组类比关系,类比数据在两个领域之间有着高度相似性。作为一个更现实的设置,我们引入了科学和创意分析数据集(SCAN),这是一个新型类比数据集,包含不同领域多重属性和关系结构的系统绘图。我们利用这一数据集测试了几个广泛使用的经过培训的语言模型(LMs)的类比推法能力。我们发现,最先进的LMs在这些复杂的类比任务上表现不佳,突出了类比理解仍然构成的挑战。