We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and $Q$-learning for tabular Markov decision processes.
翻译:我们研究的是估算在可分离的Banach空间上定义的合同操作员的固定点的问题。我们侧重于提供对操作员进行噪音评估的随机查询模型,我们分析一个差异减少的随机近似方案,并为操作员的缺陷和估计错误建立非随机半温度限制。与最坏的保证相比,我们的界限取决于实例,并实现当地零食小麦克斯风险的不简单风险。对于操作员来说,合同性可以放松到多步合同性,这样就可以将该理论应用于诸如强化学习中的平均奖励政策评价问题。我们通过应用随机最短路径问题、2位玩家零和马可夫游戏,以及政策评估和用$Q学习用于表格马可夫决策过程的理论。