We prove several hardness results for training depth-2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NP-hard already for a network with a single ReLU. We also prove NP-hardness for outputting a weighted sum of $k$ ReLUs minimizing the squared error (for $k>1$) even in the realizable setting (i.e., when the labels are consistent with an unknown depth-2 ReLU network). We are also able to obtain lower bounds on the running time in terms of the desired additive error $\epsilon$. To obtain our lower bounds, we use the Gap Exponential Time Hypothesis (Gap-ETH) as well as a new hypothesis regarding the hardness of approximating the well known Densest $\kappa$-Subgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in $1/\epsilon^2$. Together with a previous work regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the first separation between proper and improper algorithms for learning a ReLU. We also study the problem of properly learning a depth-2 network of ReLUs with bounded weights giving new (worst-case) upper bounds on the running time needed to learn such networks both in the realizable and agnostic settings. Our upper bounds on the running time essentially matches our lower bounds in terms of the dependency on $\epsilon$.
翻译:我们用 ReLU 激活功能来培训深度-2 神经网络,我们证明这些网络是若干硬性结果; 这些网络仅仅是RELU 的加权总和( 可能包括负系数 ) 。 我们的目标是输出一个深度-2 神经网络, 以尽可能减少与特定训练组有关的平方损失。 我们证明, 这个问题对于一个使用单一 ReLU 的网络来说已经是NP- 硬性了。 我们还证明, 输出一个加权金额为k$ ReLU 的加权UU 将正方差( $>1美元) 最小化。 这些网络甚至是在可实现的环境下( 即标签与未知的深度-2 REL 网络的加权值一致 ) 。 我们还能够在运行过程中获得较低的界限。 我们使用Gap Evaltialtial ypothesis (Gap- ETH) 以及一个新的假设, 在已知的 Descrial- kaptappa ral ral ral- delideal lax, 在正常的网络中, 在运行过程中要学习一个不定期的错误。