We investigate the complexity of training a two-layer ReLU neural network with weight decay regularization. Previous research has shown that the optimal solution of this problem can be found by solving a standard cone-constrained convex program. Using this convex formulation, we prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it. In particular, when $\epsilon\leq\sqrt{84/83}-1\approx 0.006$, we show that it is NP-hard to find an approximate global optimizer of the ReLU network objective with relative error $\epsilon$ with respect to the objective value. Moreover, we develop a randomized algorithm which mirrors the Goemans-Williamson rounding of semidefinite Max-Cut relaxations. To provide polynomial-time approximations, we classify training datasets into three categories: (i) For orthogonal separable datasets, a precise solution can be obtained in polynomial-time. (ii) When there is a negative correlation between samples of different classes, we give a polynomial-time approximation with relative error $\sqrt{\pi/2}-1\approx 0.253$. (iii) For general datasets, the degree to which the problem can be approximated in polynomial-time is governed by a geometric factor that controls the diameter of two zonotopes intrinsic to the dataset. To our knowledge, these results present the first polynomial-time approximation guarantees along with first hardness of approximation results for regularized ReLU networks.
翻译:暂无翻译