Several recent works have shown separation results between deep neural networks, and hypothesis classes with inferior approximation capacity such as shallow networks or kernel classes. On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks. In this work we study the intricate connection between learnability and approximation capacity. We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target. Specifically, we show that a necessary condition for a function to be learnable by gradient descent on deep neural networks is to be able to approximate the function, at least in a weak sense, with shallow neural networks. We also show that a class of functions can be learned by an efficient statistical query algorithm if and only if it can be approximated in a weak sense by some kernel class. We give several examples of functions which demonstrate depth separation, and conclude that they cannot be efficiently learned, even by a hypothesis class that can efficiently approximate them.
翻译:最近的几项研究显示,深神经网络和低近似能力假设等级(如浅网络或内核等级)之间的分离结果。另一方面,深网络能够有效地表达目标功能并不意味着深神经网络能够有效地学习这一目标功能。在这项工作中,我们研究了可学习性和近近似能力之间的复杂联系。我们表明,与深目标功能的深网络的可学习性取决于更简单的类别能够接近目标。具体地说,我们表明,深神经网络的梯度下降可以学习的函数的一个必要条件就是能够以浅神经网络来接近该功能,至少从较弱的意义上来说。我们还表明,只有某些内核阶级能够从较弱的意义上进行接近,才能通过有效的统计查询算法来学习某类功能。我们举了几个表明深度分离的功能的例子,并得出结论,即使通过能够有效地接近这些功能的假设等级,也无法有效地学习这些功能。