We consider Linear Stochastic Approximation (LSA) with a constant stepsize and Markovian data. Viewing the joint process of the data and LSA iterate as a time-homogeneous Markov chain, we prove its convergence to a unique limiting and stationary distribution in Wasserstein distance and establish non-asymptotic, geometric convergence rates. Furthermore, we show that the bias vector of this limit admits an infinite series expansion with respect to the stepsize. Consequently, the bias is proportional to the stepsize up to higher order terms. This result stands in contrast with LSA under i.i.d. data, for which the bias vanishes. In the reversible chain setting, we provide a general characterization of the relationship between the bias and the mixing time of the Markovian data, establishing that they are roughly proportional to each other. While Polyak-Ruppert tail-averaging reduces the variance of the LSA iterates, it does not affect the bias. The above characterization allows us to show that the bias can be reduced using Richardson-Romberg extrapolation with $m \ge 2$ stepsizes, which eliminates the $m - 1$ leading terms in the bias expansion. This extrapolation scheme leads to an exponentially smaller bias and an improved mean squared error, both in theory and empirically. Our results immediately apply to the Temporal Difference learning algorithm with linear function approximation, Markovian data and constant stepsizes.
翻译:我们认为,线性软体适应(LSA) 具有恒定的阶梯化和 Markov 数据。 将数据和 LSA的循环作为时间- 均匀的 Markov 链条, 我们证明它与瓦塞斯坦 距离的独特限制和固定分布趋同, 并建立了非非无损的、 几何趋同率。 此外, 我们显示, 这一限制的偏向矢量允许在阶梯化方面进行无限的序列扩张。 因此, 偏差与阶梯化到更高顺序条件的阶梯化成成比例。 这与I. i. d. 数据下的 LSA 和 LSA 的轨迹变相对比, 偏差会消失。 在可逆的链式设置中, 我们对偏差和 Markovian 数据混合时间之间的关系作了概括性描述, 确定它们与对方的相近。 虽然 Polyak- Ruppert 的尾部稳定成比例化减少了LSA 的偏差, 但它不会影响偏差。 上面的描述允许我们用 Richardson- Romble adal adal adalizalization 函数可以减少偏差, 和 ladealizalize ladeal lade lade lade 。