We study a class of nonconvex nonsmooth optimization problems in which the objective is a sum of two functions: One function is the average of a large number of differentiable functions, while the other function is proper, lower semicontinuous and has a surrogate function that satisfies standard assumptions. Such problems arise in machine learning and regularized empirical risk minimization applications. However, nonconvexity and the large-sum structure are challenging for the design of new algorithms. Consequently, effective algorithms for such scenarios are scarce. We introduce and study three stochastic variance-reduced majorization-minimization (MM) algorithms, combining the general MM principle with new variance-reduced techniques. We provide almost surely subsequential convergence of the generated sequence to a stationary point. We further show that our algorithms possess the best-known complexity bounds in terms of gradient evaluations. We demonstrate the effectiveness of our algorithms on sparse binary classification problems, sparse multi-class logistic regressions, and neural networks by employing several widely-used and publicly available data sets.
翻译:暂无翻译