In this work, we study the concentration behavior of a stochastic approximation (SA) algorithm under a contractive operator with respect to an arbitrary norm. We consider two settings where the iterates are potentially unbounded: (1) bounded multiplicative noise, and (2) additive sub-Gaussian noise. We obtain maximal concentration inequalities on the convergence errors, and show that these errors have sub-Gaussian tails in the additive noise setting, and super-polynomial tails (faster than polynomial decay) in the multiplicative noise setting. In addition, we provide an impossibility result showing that it is in general not possible to achieve sub-exponential tails for SA with multiplicative noise. To establish these results, we develop a novel bootstrapping argument that involves bounding the moment generating function of the generalized Moreau envelope of the error and the construction of an exponential supermartingale to enable using Ville's maximal inequality. To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and $Q$-learning. To the best of our knowledge, super-polynomial concentration bounds for off-policy TD-learning have not been established in the literature due to the challenge of handling the combination of unbounded iterates and multiplicative noise.
翻译:压缩随机逼近的集中性:加性和乘性噪声
翻译摘要:
在这项工作中,我们研究了一种相对于任意范数具有压缩性操作的随机逼近(SA)算法的集中行为。我们考虑两种可能不受限制的迭代设置:(1)有界乘性噪声,和(2)加性次高斯噪声。我们得到了收敛误差的最大集中不等式,并表明这些误差具有在加性噪声设置下的次高斯尾部,并在乘性噪声设置下具有超多项式尾部(比多项式衰减更快)。此外,我们提供了一个不可能的结果,表明通常不可能通过乘性噪声来实现亚指数尾部的SA。为了建立这些结果,我们开发了一种新颖的自举论证方法,涉及到限制误差的广义Moreau包络的矩生成函数,并构建了指数超鞅以实现使用Ville的最大不等式。为了演示我们的理论结果的适用性,我们使用它们提供了一类广泛的强化学习算法的最大集中边界,包括但不限于使用线性函数逼近的同策略TD-学习、具有广义重要性采样因子的离策略TD-学习和Q-学习。据我们所知,由于处理不受限制迭代和乘性噪声的组合的挑战,尚未在文献中建立离策略TD-学习的超多项式集中边界。