We analyze the behavior of projected stochastic gradient descent focusing on the case where the optimum is on the boundary of the constraint set and the gradient does not vanish at the optimum. Here iterates may in expectation make progress against the objective at each step. When this and an appropriate moment condition on noise holds, we prove that the convergence rate to the optimum of the constrained stochastic gradient descent will be different and typically be faster than the unconstrained stochastic gradient descent algorithm. Our results argue that the concentration around the optimum is exponentially distributed rather than normally distributed, which typically determines the limiting convergence in the unconstrained case. The methods that we develop rely on a geometric ergodicity proof. This extends a result on Markov chains by Hajek (1982) to the area of stochastic approximation algorithms.As examples, we show how the results apply to linear programming and tabular reinforcement learning.
翻译:我们分析预测的随机梯度梯度下降行为, 重点是最佳值位于约束线的边界上, 梯度不会在最佳值时消失。 这里的斜度可能会在每一步的目标上取得进步。 当此和适当时刻的噪音保持时, 我们证明, 受限制的随机梯度下降的最佳值的趋同率将会不同, 通常比未受限制的随机梯度梯度下降算法要快。 我们的结果表明, 最佳值周围的浓度是指数性分布的, 而不是通常分布的, 这通常决定了不受限制的案例中的有限趋同。 我们开发的方法依赖于几何性垂直度测试 。 这把Hajek (1982) 的Markov 链条的结果延伸到了随机近似值算法领域 。 例如, 我们展示结果如何适用于线性编程和表式强化学习 。