Achemy:元加强学习机构的基准和分析工具包 (Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents)

Jane X. Wang,Michael King,Nicolas Porcel,Zeb Kurth-Nelson,Tina Zhu,Charlie Deck,Peter Choy,Mary Cassin,Malcolm Reynolds,Francis Song,Gavin Buttimore,David P. Reichert,Neil Rabinowitz,Loic Matthey,Demis Hassabis,Alexander Lerchner,Matthew Botvinick

from arxiv, Published in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2021

There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled analysis. In the present work, we introduce a new benchmark for meta-RL research, emphasizing transparency and potential for in-depth analysis as well as structural richness. Alchemy is a 3D video game, implemented in Unity, which involves a latent causal structure that is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge. We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents. Results clearly indicate a frank and specific failure of meta-learning, providing validation for Alchemy as a challenging benchmark for meta-RL. Concurrent with this report, we are releasing Alchemy as public resource, together with a suite of analysis tools and sample agent trajectories.

翻译：对元学习作为提高强化学习的灵活性和抽样效率的方法的兴趣迅速增加,但这一领域的一个问题在于缺乏适当的基准任务,一般而言,过去基准所依据的结构过于简单,本身无法引起兴趣,或定义过窄,无法支持原则分析。在目前的工作中,我们为元研究引入了新的基准,强调透明度和深入分析以及结构丰富的潜力。Alchemy是一个3D视频游戏,在团结公司实施,它涉及一种潜在的因果结构,从事件到事件,提供结构学习、在线推断、假设测试和基于抽象领域知识的行动顺序。我们评估了一组强大的RL代理商在Alchemy上的表现,并对其中一种代理商进行了深入分析。结果清楚地表明了元学习的坦率和具体的失败,将Achemy验证为元-RL的具有挑战性的基准。与本报告同时,我们将Achemy作为公共资源,同时提供一套分析工具和样本代理商轨迹。