We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error that can be controlled through a batch size parameter. Our proposed oracles are appealing in several practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed (i.i.d.) samples, or simulation optimization, where the function measurements are `biased' due to computational constraints. In either case, increasing the batch size reduces the estimation error. We highlight the applicability of our biased gradient oracles in a risk-sensitive reinforcement learning setting. In the stochastic non-convex optimization context, we analyze a variant of the randomized stochastic gradient (RSG) algorithm with a biased gradient oracle. We quantify the convergence rate of this algorithm by deriving non-asymptotic bounds on its performance. Next, in the stochastic convex optimization setting, we derive non-asymptotic bounds for the last iterate of a stochastic gradient descent (SGD) algorithm with a biased gradient oracle.
翻译:我们引入偏差梯度或梯子以捕捉功能测量有估计错误,可以通过批量大小参数加以控制的设置。我们提议的阶子在几种实际情况下具有吸引力,例如,从一组独立和相同分布的(i.d.)样本中进行的风险测量估计,或模拟优化,由于计算限制,函数测量是“偏差”的。在这两种情况下,增加批量大小可以减少估计错误。我们强调我们偏差的梯子或梯子在风险敏感强化学习环境中的适用性。在随机非convex优化背景下,我们分析了带有偏差梯度或角的随机随机随机切变梯度算法的变异。我们通过在性能上得出非偏差的梯度界限来量化这一算法的趋同率。接着,在随机切分数的矩形优化设置中,我们用偏差梯度或骨骼算出最后一个梯度梯度梯度梯度梯度梯度的最后一次迭代算法的非防线。