Deep learning's successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel. This improved understanding should lead to better learning algorithms.
翻译:深层学习的成功往往归因于它能够自动发现数据的新表达方式,而不是像其他学习方法一样依赖手工制作的特征。然而,我们表明,标准梯度下沉算法所学的深层网络实际上在数学上大致相当于内核机,这是一种简单的记忆数据并直接用于通过类似功能(内核)预测的学习方法。这极大地增强了深网络重量的可解释性,因为它表明它们实际上是培训实例的叠加。网络结构将目标函数的知识融入内核。这种更好的理解应导致更好的学习算法。