This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an 'extrapolating' or 'reasoning' learner. We then study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for a class of network models including instances of Transformers, random features models, and diagonal linear networks, a min-degree-interpolator (MDI) is learned on the unseen. We also provide evidence that other instances with larger learning rates or mean-field networks reach leaky MDIs. These findings lead to two implications: (1) we provide an explanation to the length generalization problem (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports.
翻译:本文考虑了逻辑(Boolean)功能的学习,重点是对看不见(GOTU)设置的概括化(GOTU)的概括化(GOTU),这是传播范围外的有力案例,其动机是在某些推理任务(例如算术/逻辑)中,数据丰富的组合性质使具有代表性的数据抽样具有挑战性,在GOTU下成功地学习给“外推”或“叛离”学习者带来第一个“外推”或“叛离”的影响。我们随后研究(S)GD所培训的不同网络结构如何在TOU下运作,并提供理论和实验证据,证明对于包括变换器、随机特征模型和对角线网络在内的网络模型类别来说,数据具有丰富的组合性质。我们还提供了证据,证明具有较大学习率或中位网络的其他实例进入了漏密的计量吸入器。这些结果导致两个影响:(1) 我们解释了(例如,Anil等人,2022);(2)我们引入了一种称为学位-Curulum的学习算法,以更高效地支持单质递增。