We present new large-scale algorithms for fitting a subgradient regularized multivariate convex regression function to $n$ samples in $d$ dimensions -- a key problem in shape constrained nonparametric regression with applications in statistics, engineering and the applied sciences. The infinite-dimensional learning task can be expressed via a convex quadratic program (QP) with $O(nd)$ decision variables and $O(n^2)$ constraints. While instances with $n$ in the lower thousands can be addressed with current algorithms within reasonable runtimes, solving larger problems (e.g., $n\approx 10^4$ or $10^5$) is computationally challenging. To this end, we present an active set type algorithm on the dual QP. For computational scalability, we allow for approximate optimization of the reduced sub-problems; and propose randomized augmentation rules for expanding the active set. We derive novel computational guarantees for our algorithms. We demonstrate that our framework can approximately solve instances of the subgradient regularized convex regression problem with $n=10^5$ and $d=10$ within minutes; and shows strong computational performance compared to earlier approaches.
翻译:暂无翻译