While Reinforcement Learning (RL) has made great strides towards solving increasingly complicated problems, many algorithms are still brittle to even slight changes in environments. Contextual Reinforcement Learning (cRL) provides a theoretical framework to model such changes in a principled manner, thereby enabling flexible, precise and interpretable task specification and generation. Thus, cRL is an important formalization for studying generalization in RL. In this work, we reason about solving cRL in theory and practice. We show that theoretically optimal behavior in contextual Markov Decision Processes requires explicit context information. In addition, we empirically explore context-based task generation, utilizing context information in training and propose cGate, our state-modulating policy architecture. To this end, we introduce the first benchmark library designed for generalization based on cRL extensions of popular benchmarks, CARL. In short: Context matters!
翻译:虽然加强学习(RL)在解决日益复杂的问题方面取得了长足进步,但许多算法仍然难以在环境方面发生微小变化。背景加强学习(cRL)提供了一个理论框架,以原则性的方式模拟这种变化,从而能够灵活、准确和解释任务规格和生成。因此,CRL是研究在RL中概括化的重要正规化。在这项工作中,我们有理由在理论和实践上解决CRL问题。我们表明,背景Markov决策过程中理论上的最佳行为需要明确的背景信息。此外,我们从经验上探索基于背景的任务生成,在培训中利用背景信息,并提出CGate,即我们的国家调整政策结构。为此,我们引入了第一个基于CRL扩展大众基准而设计的通用基准图书馆,CARL。简而言之:背景事项!