Stochastic gradient descent (SGD) provides a simple and efficient way to solve a broad range of machine learning problems. Here, we focus on distribution regression (DR), involving two stages of sampling: Firstly, we regress from probability measures to real-valued responses. Secondly, we sample bags from these distributions for utilizing them to solve the overall regression problem. Recently, DR has been tackled by applying kernel ridge regression and the learning properties of this approach are well understood. However, nothing is known about the learning properties of SGD for two stage sampling problems. We fill this gap and provide theoretical guarantees for the performance of SGD for DR. Our bounds are optimal in a mini-max sense under standard assumptions.
翻译:沙丘梯度下降(SGD)为解决一系列广泛的机器学习问题提供了简单而有效的方法。在这里,我们侧重于分布回归(DR),涉及两个取样阶段:第一,我们从概率措施倒退到实际估价反应;第二,我们从这些分布中抽样袋,以便利用它们解决整体回归问题。最近,通过应用内核脊脊回归来解决DR,这一方法的学习特性得到了很好的理解。然而,对于SGD在两个阶段取样问题上的学习性质却一无所知。我们填补了这一空白,并为SGD为DD的运行提供了理论保证。根据标准假设,我们的界限在一种最优的意义上是微型质量。