The ability to learn and predict simple functions is a key aspect of human intelligence. Recent works have started to explore this ability using transformer architectures, however it remains unclear whether this is sufficient to recapitulate the extrapolation abilities of people in this domain. Here, we propose to address this gap by augmenting the transformer architecture with two simple inductive learning biases, that are directly adapted from recent models of abstract reasoning in cognitive science. The results we report demonstrate that these biases are helpful in the context of large neural network models, as well as shed light on the types of inductive learning biases that may contribute to human abilities in extrapolation.
翻译:能够学习和预测简单函数是人类智能的关键方面。最近的研究开始使用transformer架构来探索这种能力,然而尚不清楚是否足以重现人们在这个领域的外推能力。在这里,我们建议通过在transformer架构中增加两个简单的归纳性学习偏置来解决这个差距,这直接来源于认知科学中最近的抽象推理模型。我们报告的结果表明,这些偏差有助于大型神经网络模型的情况,并阐明了可能有助于外推能力的归纳性学习偏差类型。