It is known that $O(N)$ parameters are sufficient for neural networks to memorize arbitrary $N$ input-label pairs. By exploiting depth, we show that $O(N^{2/3})$ parameters suffice to memorize $N$ pairs, under a mild condition on the separation of input points. In particular, deeper networks (even with width $3$) are shown to memorize more pairs than shallow networks, which also agrees with the recent line of works on the benefits of depth for function approximation. We also provide empirical results that support our theoretical findings.
翻译:众所周知, $O( N) 参数足以让神经网络记住任意的 $( N) 输入标签的配对。 通过深度开发, 我们发现, $( N ⁇ 2/3 }) 参数足以在输入点分离的温和条件下, 将一对美元作为一对, 特别是, 更深的网络( 即使宽度为3美元) 显示比浅层网络更能记住一对, 浅层网络也同意最近关于功能近似深度好处的工程线。 我们还提供了支持我们理论结论的经验性结果 。