Stochastic gradient descent (SGD) and its variants have established themselves as the go-to algorithms for large-scale machine learning problems with independent samples due to their generalization performance and intrinsic computational advantage. However, the fact that the stochastic gradient is a biased estimator of the full gradient with correlated samples has led to the lack of theoretical understanding of how SGD behaves under correlated settings and hindered its use in such cases. In this paper, we focus on hyperparameter estimation for the Gaussian process (GP) and take a step forward towards breaking the barrier by proving minibatch SGD converges to a critical point of the full log-likelihood loss function, and recovers model hyperparameters with rate $O(\frac{1}{K})$ for $K$ iterations, up to a statistical error term depending on the minibatch size. Our theoretical guarantees hold provided that the kernel functions exhibit exponential or polynomial eigendecay which is satisfied by a wide range of kernels commonly used in GPs. Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.
翻译:软性梯度(SGD) 及其变种已经确立为对独立样本的大规模机器学习问题进行自我测算的算法, 其原因是其一般性能和内在计算优势。然而, 软性梯度是全梯度的偏差估测器, 与相关样品相交, 导致对 SGD 在相关设置下的行为方式缺乏理论理解, 并妨碍在此类情况下使用SGOS。 在本文中, 我们侧重于高斯进程( GP) 的超光度估计, 并朝着打破屏障迈出了一步, 证明迷性球 SGD 与全日志相似丢失功能的临界点汇合, 并用美元(\\ frac{ 1 ⁇ K}) 和 美元( $K) 的循环率回收模型超度度计, 导致对SGDGD 如何在相关情况下的行为缺乏理论上的错误术语。 我们的理论保证是, 内核函数显示指数或多核基因基因分解, 这得到在GPGD中常用的一系列广泛使用的内核内核分解系统所满足, 。 在模拟和新计算方法上, 模拟和真实数据分析中, 改进的模型的新的计算方法都显示, 改进了模拟和模拟和精确分析方法的新的计算方法的缩压后, 改进了总化方法的新的计算, 减少了了新的计算方法, 改进了新的计算方法。