Deep neural networks can empirically perform efficient hierarchical learning, in which the layers learn useful representations of the data. However, how they make use of the intermediate representations are not explained by recent theories that relate them to "shallow learners" such as kernels. In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks and can be advantageous over raw inputs. We consider a fixed, randomly initialized neural network as a representation function fed into another trainable network. When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$. We contrast our result with a lower bound showing that neural representations do not improve over the raw input (in the infinite width limit), when the trainable network is instead a neural tangent kernel. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
翻译:深心神经网络可以以经验方式执行高效的等级学习, 由层层学习数据的有效表达方式。 但是, 近来的理论并没有解释它们如何利用中间表达方式, 而这些理论与“ 浅层学习者” 诸如内核等“ 浅层学习者” 有关。 在这项工作中, 我们证明中间神经代表方式增加了神经网络的灵活性, 并且能够比原始投入更有利。 我们认为固定的随机初始化神经网络是一种代表功能, 被输入到另一个可训练网络。 当可训练网络是宽广的两层网络的四面型泰勒模型时, 我们显示神经代表方式与原始投入相比能够实现更好的样本复杂性: 为了学习低层- $- $ 多元( geq 4$) 的维度, 神经代表方式只需要$\ tilde{O} ( däcelp p/2\rceil} (rceil prceil} 样本, 而最知名的样本复杂性是$\tridede{O} 。 我们对比了我们的结果, 当我们最深层的内层的内层结构代表方式显示, 当我们的深层的深度显示我们的深度代表是能够改善的深度的时候, 当我们的内深层的深度代表方式是, 我们的深度的内层的深度代表方式是, 当我们的深层, 当我们的深层的深度显示是, 时, 当我们的深层, 当我们的深层的深度显示是可以提供时, 当我们的深层的深度的深度是, 当我们的深层的深度显示是, 时, 当我们的深度是能够是, 时, 当我们的深度是, 时, 我们的深度的深度是, 时, 当我们的深度的深度显示是, 当我们的深度的深度的深度的深度的深度是, 当我们的深度的深度的深度是, 时, 当我们的深度是提供了一个令人。