Meticulously analysing the empirical strengths and weaknesses of reinforcement learning methods in hard (challenging) environments is essential to inspire innovations and assess progress in the field. In tabular reinforcement learning, there is no well-established standard selection of environments to conduct such analysis, which is partially due to the lack of a widespread understanding of the rich theory of hardness of environments. The goal of this paper is to unlock the practical usefulness of this theory through four main contributions. First, we present a systematic survey of the theory of hardness, which also identifies promising research directions. Second, we introduce Colosseum, a pioneering package that enables empirical hardness analysis and implements a principled benchmark composed of environments that are diverse with respect to different measures of hardness. Third, we present an empirical analysis that provides new insights into computable measures. Finally, we benchmark five tabular agents in our newly proposed benchmark. While advancing the theoretical understanding of hardness in non-tabular reinforcement learning remains essential, our contributions in the tabular setting are intended as solid steps towards a principled non-tabular benchmark. Accordingly, we benchmark four agents in non-tabular versions of Colosseum environments, obtaining results that demonstrate the generality of tabular hardness measures.
翻译:在硬(挑战)环境中,模拟分析强化学习方法的经验长处和短处对于激励创新和评估实地进展至关重要。在表列强化学习中,没有为进行这种分析而建立完善的标准环境选择,部分原因是对丰富的环境严谨理论缺乏广泛了解。本文件的目的是通过四大主要贡献释放这一理论的实际效用。首先,我们对硬性理论进行系统调查,并找出有希望的研究方向。第二,我们引入了Colosseum,这是一个开创性的一揽子方案,能够进行实证硬性分析,并执行一项由不同硬性衡量方法不同环境组成的原则性基准。第三,我们提出实证性分析,为可计量措施提供新的见解。最后,我们在新提出的基准中以五个表剂为基准,在推动理论上对非表型强化学习的硬性理解的同时,我们在表格设置中的贡献是朝着原则性非表型基准的坚实步骤。因此,我们在非表型环境的硬性标准中以四个代理人为基准,对硬性标准进行基准,以获得结果。