We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasing interest in deploying learning-based methods, there has been a flurry of recent proposals for OPE method, leading to a need for standardized empirical analyses. Our work takes a strong focus on diversity of experimental design to enable stress testing of OPE methods. We provide a comprehensive benchmarking suite to study the interplay of different attributes on method performance. We distill the results into a summarized set of guidelines for OPE in practice. Our software package, the Caltech OPE Benchmarking Suite (COBS), is open-sourced and we invite interested researchers to further contribute to the benchmark.
翻译:我们为强化学习方面的非政策性政策评价提供了实验性基准和经验研究,这是许多安全关键应用中的一个关键问题。鉴于对采用基于学习的方法的兴趣日益增加,我们最近对基于学习的方法提出了大量建议,从而需要标准化的经验分析。我们的工作非常侧重于实验设计的多样性,以便能够对促进平等方案的方法进行压力测试。我们提供了一个全面的基准套件,以研究方法表现不同属性的相互作用。我们把这些结果归纳为促进平等方案在实践方面的一套汇总准则。我们的软件包,即Caltech OPE基准套件(COBS)是开源的,我们请感兴趣的研究人员为基准作出进一步贡献。