It has been recognized that a heavily overparameterized artificial neural network exhibits surprisingly good generalization performance in various machine-learning tasks. Recent theoretical studies have made attempts to unveil the mystery of the overparameterization. In most of those previous works, the overparameterization is achieved by increasing the width of the network, while the effect of increasing the depth has remained less well understood. In this work, we investigate the effect of increasing the depth within an overparameterized regime. To gain an insight into the advantage of depth, we introduce local and global labels as abstract but simple classification rules. It turns out that the locality of the relevant feature for a given classification rule plays a key role; our experimental results suggest that deeper is better for local labels, whereas shallower is better for global labels. We also compare the results of finite networks with those of the neural tangent kernel (NTK), which is equivalent to an infinitely wide network with a proper initialization and an infinitesimal learning rate. It is shown that the NTK does not correctly capture the depth dependence of the generalization performance, which indicates the importance of the feature learning rather than the lazy learning.
翻译:人们已经认识到,一个严重超度的人工神经网络在各种机器学习任务中表现出惊人的超度光化性能。最近的理论研究试图揭示超度度化的奥秘。在以往的多数著作中,超度化是通过扩大网络的宽度来实现的,而增加深度的效果仍然不太为人所理解。在这项工作中,我们研究了在超度化制度内增加深度的影响。为了深入了解深度的优势,我们引入了本地和全球标签,将其作为抽象而简单的分类规则。结果显示,某个特定分类规则的相关特征的位置起着关键作用;我们的实验结果表明,对于本地标签来说,深度更好一些,而对于全球标签来说,浅度则更好。我们还比较了有限网络的结果与神经离心网(NTK)的结果,后者相当于一个无限宽的网络,而适当的初始化和最微量的学习率。NTK没有正确地捕捉到一般化性功能的深度依赖性能,这表明该特征学习的重要性,而不是懒惰性。