In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M < N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N . For a ReLU activation function, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.
翻译:在本文中,我们研究将带有N节点的目标双层神经网络压缩为M < N节点的压缩网络。 更准确地说, 我们考虑目标网络的重量是 i. d. sub- Gausian, 并在高斯输入假设下将目标输出与压缩网络的L2 之间的人口损失最小化。 通过使用高斯输入的高度概率工具, 我们显示, 当目标网络足够过分时, 非电离层问题可以简化, 并且提供这种近似的误差率, 作为输入维和 N 的函数。 对于ReLU 激活功能, 我们推测, 简化优化问题的最佳方式是通过对视角框架(EQUTF)的重量和方向进行加权, 而 ETF 的重量和方向则取决于目标网络的参数。 提供数字证据支持这一推断。