Neural networks trained on large datasets by minimizing a loss have become the state-of-the-art approach for resolving data science problems, particularly in computer vision, image processing and natural language processing. In spite of their striking results, our theoretical understanding about how neural networks operate is limited. In particular, what are the interpolation capabilities of trained neural networks? In this paper we discuss a theorem of Domingos stating that "every machine learned by continuous gradient descent is approximately a kernel machine". According to Domingos, this fact leads to conclude that all machines trained on data are mere kernel machines. We first extend Domingo's result in the discrete case and to networks with vector-valued output. We then study its relevance and significance on simple examples. We find that in simple cases, the "neural tangent kernel" arising in Domingos' theorem does provide understanding of the networks' predictions. Furthermore, when the task given to the network grows in complexity, the interpolation capability of the network can be effectively explained by Domingos' theorem, and therefore is limited. We illustrate this fact on a classic perception theory problem: recovering a shape from its boundary.
翻译:通过尽量减少损失,在大型数据集方面受过培训的神经网络已经成为解决数据科学问题,特别是计算机视觉、图像处理和自然语言处理方面的数据科学问题的最先进方法。尽管取得了惊人的结果,但我们对于神经网络如何运作的理论理解是有限的。特别是,受过训练的神经网络的内插能力是多少?在本文件中我们讨论多明各的理论,指出“通过连续梯度下降所学的每一台机器大约是一个内核机器”。多明各认为,这一事实导致得出以下结论:所有接受过数据培训的机器都是纯粹的内核机器。我们首先将多明各在离散案例中的结果扩大到带有矢量值输出的网络。我们然后在简单的例子中研究其相关性和重要性。我们发现,在简单的例子中,多明各的神经网络产生的“内核”确实提供了对网络预测的理解。此外,当网络的任务日益复杂时,网络的内核能力可以由多明各的理论来有效解释,因此网络的内核能力是有限的。我们用一个典型的理论来说明这个事实:从它的形状中恢复。