Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.
翻译:强化学习(RL)是机器学习中最有活力的研究领域之一,最近被用于解决一些具有挑战性的问题,在本文件中,我们主要侧重于非政策性评价(OPE),这是RL中最基本的主题之一。近年来,在统计和计算机科学文献中开发了一些OPE方法。我们讨论了促进平等办公室的效率约束、一些现有最先进的OPE方法、其统计特性和目前积极探索的一些其他相关研究方向。