We present new large-scale algorithms for fitting a subgradient regularized multivariate convex regression function to $n$ samples in $d$ dimensions -- a key problem in shape constrained nonparametric regression with widespread applications in statistics, engineering and the applied sciences. The infinite-dimensional learning task can be expressed via a convex quadratic program (QP) with $O(nd)$ decision variables and $O(n^2)$ constraints. While instances with $n$ in the lower thousands can be addressed with current algorithms within reasonable runtimes, solving larger problems (e.g., $n\approx 10^4$ or $10^5$) is computationally challenging. To this end, we present an active set type algorithm on the dual QP. For computational scalability, we perform approximate optimization of the reduced sub-problems; and propose randomized augmentation rules for expanding the active set. Although the dual is not strongly convex, we present a novel linear convergence rate of our algorithm on the dual. We demonstrate that our framework can approximately solve instances of the convex regression problem with $n=10^5$ and $d=10$ within minutes; and offers significant computational gains compared to earlier approaches.
翻译:我们提出了新的大型算法,将一个亚梯度的常规多变共振回归功能与美元样本相匹配,以美元为美元维度,这是在统计、工程和应用科学中广泛应用的受限制的非参数回归功能中的一个关键问题。无限的多元学习任务可以通过一个共振二次曲线程序(QP)来表达,其金额为O(n)美元决定变量和美元(n)2美元限制。虽然在合理的运行时间里,可以用当前算法来解决低千美元的情况,但解决更大的问题(例如,$n\approx 10 4美元或10 5美元)在计算上具有挑战性。我们为此在双轨QP上展示了一套积极的固定算法。对于计算性而言,我们对减少的子参数进行了大概的优化;为扩展活动集提出了随机化的增强规则。虽然双轨法不是很强的,但我们在双轨法上展示了一种新型的线性趋同率趋同率趋同率。我们证明我们的框架可以大约在10美元范围内解决对10美元和美元之前的计算结果的近10美元=10美元内。