Neural networks have achieved impressive results on many technological and scientific tasks. Yet, their empirical successes have outpaced our fundamental understanding of their structure and function. By identifying mechanisms driving the successes of neural networks, we can provide principled approaches for improving neural network performance and develop simple and effective alternatives. In this work, we isolate the key mechanism driving feature learning in fully connected neural networks by connecting neural feature learning to the average gradient outer product. We subsequently leverage this mechanism to design \textit{Recursive Feature Machines} (RFMs), which are kernel machines that learn features. We show that RFMs (1) accurately capture features learned by deep fully connected neural networks, (2) close the gap between kernel machines and fully connected networks, and (3) surpass a broad spectrum of models including neural networks on tabular data. Furthermore, we demonstrate that RFMs shed light on recently observed deep learning phenomena such as grokking, lottery tickets, simplicity biases, and spurious features. We provide a Python implementation to make our method broadly accessible [\href{https://github.com/aradha/recursive_feature_machines}{GitHub}].
翻译:在许多技术和科学任务上,神经网络取得了令人印象深刻的成果。然而,它们的实证成功超过了我们对自身结构和功能的基本理解。通过确定推动神经网络成功的机制,我们可以提供改善神经网络性能和开发简单而有效的替代方法的原则性方法。在这项工作中,我们通过将神经特征学习与平均梯度外产品连接起来,将完全连接的神经网络学习的关键机制特征分离出来。我们随后利用这一机制设计了具有内核特征的机器。我们表明,RFM(1) 准确地捕捉了完全连接的神经网络所学到的特征,(2) 缩小了内核机器和完全连接的网络之间的差距,(3) 超越了广泛的模型范围,包括表格数据的神经网络。此外,我们证明,RFM为最近观察到的深层学习现象,如石化、彩票、简单偏差和假性特征等提供了亮光。我们提供了一种Python实施方法,使我们的方法能够广泛获得[hrefes://github.com/aradha/recuritrus_strain_strain_matural_matystrat_stratriat_