In recent years, a growing number of deep model-based reinforcement learning (RL) methods have been introduced. The interest in deep model-based RL is not surprising, given its many potential benefits, such as higher sample efficiency and the potential for fast adaption to changes in the environment. However, we demonstrate, using an improved version of the recently introduced Local Change Adaptation (LoCA) setup, that well-known model-based methods such as PlaNet and DreamerV2 perform poorly in their ability to adapt to local environmental changes. Combined with prior work that made a similar observation about the other popular model-based method, MuZero, a trend appears to emerge, suggesting that current deep model-based methods have serious limitations. We dive deeper into the causes of this poor performance, by identifying elements that hurt adaptive behavior and linking these to underlying techniques frequently used in deep model-based RL. We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes. Furthermore, we provide detailed insights into the challenges of building an adaptive nonlinear model-based method, by experimenting with a nonlinear version of Dyna.
翻译:近年来,引进了越来越多的深层次模型强化学习(RL)方法。考虑到其许多潜在好处,例如抽样效率更高和有可能快速适应环境变化,对深层次模型强化学习(RL)的兴趣并不令人意外。然而,我们利用最近推出的地方变化适应(LOCA)设置的改良版本,证明PlaNet和DreamerV2等众所周知的模式基础方法在适应当地环境变化的能力方面表现不佳。加上以前的工作,对其他流行模型方法MuZero的类似观察,似乎出现了一种趋势,表明目前的深层次模型基础方法有严重的局限性。我们通过找出损害适应行为的因素,并将这些因素与深层次模型RL中经常使用的基本技术联系起来,从而深入地深入地探究这种不良表现的原因。我们用经验证实了线性功能近似学的这些洞察力,证明经修改的线性Dyna版本能够有效地适应当地变化。此外,我们以非线性D型版本实验的方式,对建立适应性非线性非线性模型的挑战提供了详细的洞察力。