It has been recognized that heavily overparameterized deep neural networks (DNNs) exhibit surprisingly good generalization performance in various machine-learning tasks. Although benefits of depth have been investigated from different perspectives such as the approximation theory and the statistical learning theory, existing theories do not adequately explain the empirical success of overparameterized DNNs. In this work, we report a remarkable interplay between depth and locality of a target function. We introduce $k$-local and $k$-global functions, and find that depth is beneficial for learning local functions but detrimental to learning global functions. This interplay is not properly captured by the neural tangent kernel, which describes an infinitely wide neural network within the lazy learning regime.
翻译:人们已经认识到,严重超分的深神经网络(DNN)在各种机器学习任务中表现出令人惊讶的良好概括性表现。尽管从近似理论和统计学习理论等不同角度对深度效益进行了调查,但现有理论并未充分解释超分式DNN的经验成功率。在这项工作中,我们报告了目标功能深度和地点之间的显著相互作用。我们引入了美元-当地和美元-全球功能,发现深度有利于学习当地功能,但不利于学习全球功能。神经正切的内核并没有正确捕捉到这种相互作用,而后者描述了懒惰学习制度内一个无限宽的神经网络。