In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N. In this mean-field limit, the simplified objective, as well as the optimal weights of the compressed network, does not depend on the realization of the target network, but only on expected scaling factors. Furthermore, for networks with ReLU activation, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.
翻译:在本文中,我们研究将带有N节点的目标双层神经网络压缩为M<N节点的压缩网络。更准确地说,我们考虑目标网络的权重是 i.d. ob-Gausian 的设置,在高山投入的假设下,将目标网络产出与压缩网络产出之间的人口L_2损失最小化,在高山投入的假设下,我们通过使用高维概率的工具,表明当目标网络足够过分时,可以简化这一非convex问题,并将这一近似误差率作为输入维度和N的函数提供。在这个平均值范围内,简化目标以及压缩网络的最佳权重并不取决于目标网络的实现,而只取决于预期的缩放因子。此外,对于使用ReLU 激活的网络,我们推测,通过对目标网络的权重进行权衡,可以实现简化优化问题的最佳程度,同时将Equicorgy Tight Fram(ETF)的重量和方向作为输入维度函数的函数。在这个平均限制中,ETF的重量和方向取决于目标网络的参数的参数。