While Reinforcement Learning has made great strides towards solving ever more complicated tasks, many algorithms are still brittle to even slight changes in their environment. This is a limiting factor for real-world applications of RL. Although the research community continuously aims at improving both robustness and generalization of RL algorithms, unfortunately it still lacks an open-source set of well-defined benchmark problems based on a consistent theoretical framework, which allows comparing different approaches in a fair, reliable and reproducibleway. To fill this gap, we propose CARL, a collection of well-known RL environments extended to contextual RL problems to study generalization. We show the urgent need of such benchmarks by demonstrating that even simple toy environments become challenging for commonly used approaches if different contextual instances of this task have to be considered. Furthermore, CARL allows us to provide first evidence that disentangling representation learning of the states from the policy learning with the context facilitates better generalization. By providing variations of diverse benchmarks from classic control, physical simulations, games and a real-world application of RNA design, CARL will allow the community to derive many more such insights on a solid empirical foundation.
翻译:虽然加强学习在解决日益复杂的任务方面迈出了巨大的步伐,但许多算法仍然难以在环境方面稍有变化,这是限制实际应用RL的一个因素。尽管研究界不断致力于改善RL算法的稳健性和普遍化,但不幸的是,它仍然缺乏一套基于一致的理论框架的开放源码的、定义明确的基准问题,这种框架允许在公平、可靠和可复制的道路上比较不同的方法。为了填补这一空白,我们提议CARL,这是一系列众所周知的RL环境的集合,其范围扩大到背景RL问题,以研究一般化。我们通过表明即使简单的玩具环境在需要考虑这项任务的不同背景情况时,也变得对常用的方法具有挑战性。此外,CARL使我们能够提供初步证据,证明从政策学习中分离国家代表与环境的学习,有助于更好地普遍化。通过提供传统控制、物理模拟、游戏和RNA设计的现实世界应用等不同基准的变换,CARL将使社区能够在坚实的经验基础上获得更多这样的洞察力。