There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled analysis. In the present work, we introduce a new benchmark for meta-RL research, which combines structural richness with structural transparency. Alchemy is a 3D video game, implemented in Unity, which involves a latent causal structure that is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge. We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents. Results clearly indicate a frank and specific failure of meta-learning, providing validation for Alchemy as a challenging benchmark for meta-RL. Concurrent with this report, we are releasing Alchemy as public resource, together with a suite of analysis tools and sample agent trajectories.
翻译:在目前的工作中,我们引入了将结构丰富与结构透明度相结合的元研究新基准;Alchemy是一个3D视频游戏,在United United中实施,它涉及一种潜在的因果结构,从一个事件到一个事件,提供结构学习、在线推断、假设测试和基于抽象域知识的行动顺序。我们评估了一对在Alchemy上的强大的RL代理,并对其中一种代理进行了深入分析。结果清楚地表明了元学习的坦率和具体的失败,为Alchemy提供了一个具有挑战性的元RL基准。与本报告同时,我们正在将Alchemy作为公共资源释放,同时提供一套分析工具和样本代理轨迹。