The successes of deep learning critically rely on the ability of neural networks to output meaningful predictions on unseen data -- generalization. Yet despite its criticality, there remain fundamental open questions on how neural networks generalize. How much do neural networks rely on memorization -- seeing highly similar training examples -- and how much are they capable of human-intelligence styled reasoning -- identifying abstract rules underlying the data? In this paper we introduce a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization. While PVR tasks can consist of visual as well as symbolic inputs, each with varying levels of difficulty, they all have a simple underlying rule. One part of the PVR task input acts as a pointer, giving the location of a different part of the input, which forms the value (and output). We demonstrate that this task structure provides a rich testbed for understanding generalization, with our empirical study showing large variations in neural network performance based on dataset size, task complexity and model architecture. The interaction of position, values and the pointer rule also allow the development of nuanced tests of generalization, by introducing distribution shift and increasing functional complexity. These reveal both subtle failures and surprising successes, suggesting many promising directions of exploration on this benchmark.
翻译:深层次学习的成功关键地依赖于神经网络的能力,以对无形数据 -- -- 一般化 -- -- 做出有意义的预测。然而,尽管神经网络具有关键性,但对于神经网络如何普遍化,仍然存在着根本性的未决问题。神经网络在多大程度上依赖于记忆化 -- -- 看到了非常相似的培训实例 -- -- 以及它们在多大程度上能够以人类智能风格推理为主的抽象规则?在这份文件中,我们引入了一个新的基准,即“点值回溯”任务,以探索神经网络一般化的局限性。尽管PVR的任务可以包括视觉和象征性的投入,每个都有不同程度的困难,但它们都有一个简单的基本规则。PVR任务投入的一部分作为点,提供了投入的不同部分的位置,从而构成了价值(和产出) 。我们证明,这一任务结构为理解概括化提供了丰富的检验台,我们的经验研究表明,根据数据集大小、任务复杂性和模型结构,神经网络的性业绩存在很大的差异。 位置、价值和点规则的相互作用,也使得这种微妙的探索性趋势的变化,通过不断的变现的变现和变现,从而提出了各种令人吃惊的变现的变现。