Gaussian process (GP) regression is a flexible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the Vecchia GP approximation, an ordered conditional approximation from spatial statistics that implies a sparse Cholesky factor of the precision matrix. We traverse the regularization path from strong to weak penalization, sequentially adding candidate covariates based on the gradient of the log-likelihood and deselecting irrelevant covariates via a new quadratic constrained coordinate descent algorithm. We propose Vecchia-based mini-batch subsampling, which provides unbiased gradient estimators. The resulting procedure is scalable to millions of responses and thousands of covariates. Theoretical analysis and numerical studies demonstrate the improved scalability and accuracy relative to existing methods.
翻译:Gausian 进程( GP) 回归是一种灵活、非参数的回归方法,它自然可以量化不确定性。 在许多应用中, 反应和共变的数量都很大, 目标是选择与响应相关的共变数。 对于这个环境, 我们提出了一个新的、 可缩放的算法, 创建了VGPR, 优化基于Vecchia GP近点的受罚的GP日志相似性, 以Vecchia GP近点为基础, 以空间统计为定序的有条件近比值, 意味着精确矩阵中一个稀疏的Cholesky 系数。 我们绕过正规化路径, 从强到弱的处罚, 以日志相似度梯度的梯度为基础, 依次增加候选的共变数, 并通过新的四边框限制下调算法来取消不相关的共变数 。 我们提出基于 Vecchia 的小型批次抽样抽样抽样, 提供公正的梯度估计器。 由此产生的程序可以缩成百万个响应和千个共变数。 。 理论分析和数字研究显示与现有方法相比, 的可改进的可缩缩和准确性 。