We consider stochastic optimization problems where data is drawn from a Markov chain. Existing methods for this setting crucially rely on knowing the mixing time of the chain, which in real-world applications is usually unknown. We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. We further show that our approach can be extended to: (i) finding stationary points in non-convex optimization with Markovian data, and (ii) obtaining better dependence on the mixing time in temporal difference (TD) learning; in both cases, our method is completely oblivious to the mixing time. Our method relies on a novel combination of multi-level Monte Carlo (MLMC) gradient estimation together with an adaptive learning method.
翻译:我们考虑的是从Markov链条中提取数据的随机优化问题。 用于这一设置的现有方法关键依赖于了解链条的混合时间, 而在现实世界的应用中,这种混合时间通常并不为人所知。 我们提出了第一种不需要混合时间知识的优化方法, 但是在应用到 convex 问题时,却获得了最佳的无干扰的趋同率。 我们进一步表明,我们的方法可以扩大到:(一) 利用Markovian数据在非电流优化中找到固定点,以及(二) 在时间差异(TD)学习中更好地依赖混合时间(TD), 在这两种情况下,我们的方法都完全忽略了混合时间。 我们的方法依赖于多层次的Monte Carlo(MLMC)梯度估算和适应性学习方法的新组合。