在线强化学习中的适应性差异化 (Adaptive Discretization in Online Reinforcement Learning)

Discretization based approaches to solving online reinforcement learning problems have been studied extensively in practice on applications ranging from resource allocation to cache management. Two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it. While there have been several experimental results investigating heuristic solutions to these questions, there has been little theoretical treatment. In this paper we provide a unified theoretical analysis of tree-based hierarchical partitioning methods for online reinforcement learning, providing model-free and model-based algorithms. We show how our algorithms are able to take advantage of inherent structure of the problem by providing guarantees that scale with respect to the 'zooming dimension' instead of the ambient dimension, an instance-dependent quantity measuring the benignness of the optimal $Q_h^\star$ function. Many applications in computing systems and operations research requires algorithms that compete on three facets: low sample complexity, mild storage requirements, and low computational burden. Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets. This motivates its use in practical applications as our approach automatically adapts to underlying problem structure even when very little is known a priori about the system.

翻译：在实际应用中,从资源分配到缓存管理,对解决在线强化学习问题的分化方法进行了广泛的研究。设计基于离散的算法的两个主要问题是:如何创建离散的算法,以及何时加以完善。虽然已经有一些实验结果,调查了这些问题的超常性解决办法,但并没有多少理论处理。在本文件中,我们对基于树的分层方法进行了统一的理论分析,用于在线强化学习,提供了无模型和基于模型的算法。我们展示了我们的算法如何能够利用问题的内在结构,为“分离层面”而不是环境层面提供保证,一个根据实例衡量最佳 $ ⁇ _h ⁇ ztar$ 功能的良性的数量。在计算系统和业务研究中,许多应用都需要在三个方面进行竞争的算法:低样本复杂性、轻度储存要求和低计算负担。我们的算法很容易适应操作上的制约,我们的理论提供了三个方面的明确界限。这促使我们在实际应用中使用它,因为我们的方法可以自动适应根本的问题结构,即使以前很少人知道。