We consider the well-studied problem of learning a linear combination of $k$ ReLU activations with respect to a Gaussian distribution on inputs in $d$ dimensions. We give the first polynomial-time algorithm that succeeds whenever $k$ is a constant. All prior polynomial-time learners require additional assumptions on the network, such as positive combining coefficients or the matrix of hidden weight vectors being well-conditioned. Our approach is based on analyzing random contractions of higher-order moment tensors. We use a multi-scale analysis to argue that sufficiently close neurons can be collapsed together, sidestepping the conditioning issues present in prior work. This allows us to design an iterative procedure to discover individual neurons.
翻译:我们考虑学习一个在$d$维空间中基于高斯分布的线性组合,其中每个线性组合元素都是$k$个ReLU函数之一。我们给出了第一个多项式时间算法,当$k$是一个定值时可以成功地进行学习。所有先前的多项式时间学习者都需要网络上的其他假设,例如:积极地合并系数或隐藏层参数矩阵的良好条件。我们的方法基于高阶动量张量的随机压缩分析。我们使用多尺度分析来说明足够接近的神经元可以被折叠在一起,从而避免之前的工作中存在的条件问题。这使我们能够设计一个迭代过程来发现单个神经元。