We consider Linear Stochastic Approximation (LSA) with a constant stepsize and Markovian data. Viewing the joint process of the data and LSA iterate as a time-homogeneous Markov chain, we prove its convergence to a unique limiting and stationary distribution in Wasserstein distance and establish non-asymptotic, geometric convergence rates. Furthermore, we show that the bias vector of this limit admits an infinite series expansion with respect to the stepsize. Consequently, the bias is proportional to the stepsize up to higher order terms. This result stands in contrast with LSA under i.i.d. data, for which the bias vanishes. In the reversible chain setting, we provide a general characterization of the relationship between the bias and the mixing time of the Markovian data, establishing that they are roughly proportional to each other. While Polyak-Ruppert tail-averaging reduces the variance of the LSA iterates, it does not affect the bias. The above characterization allows us to show that the bias can be reduced using Richardson-Romberg extrapolation with $m\ge 2$ stepsizes, which eliminates the $m-1$ leading terms in the bias expansion. This extrapolation scheme leads to an exponentially smaller bias and an improved mean squared error, both in theory and empirically. Our results immediately apply to the Temporal Difference learning algorithm with linear function approximation, Markovian data, and constant stepsizes.
翻译:我们认为,线性软体适应(LSA) 具有恒定的阶梯化和 Markov 数据。 将数据和 LSA的循环作为时间- 均匀的 Markov 链条, 我们证明它与瓦塞斯坦 距离的独特限制和固定分布趋同, 并建立了非非无损的、 几何趋同率。 此外, 我们显示, 这一限制的偏向矢量允许在阶梯化方面进行无限的序列扩张。 因此, 偏差与阶梯化达到更高顺序条件的阶梯化成比例。 这与 i. d. 数据下的 LSA 和 LSA 相反, 偏差会消失。 在可逆的链条设置中, 我们对偏差的偏差和固定分布进行了总体的描述, 并确定了Markovian 数据偏差与不均匀的相容性分布。 虽然 Polyak- Ruppert 尾部调和 lapperate 的偏差性, 但并不影响偏差。 上面的描述让我们显示, 偏差可以使用 Richardson- $- Robbilalal oral ad oralislation oralalization 函数来减少, 导致一个更小的阶值的阶值的阶值的阶梯值的阶值的阶梯值 。