Reinforcement learning agents must generalize beyond their training experience. Prior work has focused mostly on identical training and evaluation environments. Starting from the recently introduced Crafter benchmark, a 2D open world survival game, we introduce a new set of environments suitable for evaluating some agent's ability to generalize on previously unseen (numbers of) objects and to adapt quickly (meta-learning). In Crafter, the agents are evaluated by the number of unlocked achievements (such as collecting resources) when trained for 1M steps. We show that current agents struggle to generalize, and introduce novel object-centric agents that improve over strong baselines. We also provide critical insights of general interest for future work on Crafter through several experiments. We show that careful hyper-parameter tuning improves the PPO baseline agent by a large margin and that even feedforward agents can unlock almost all achievements by relying on the inventory display. We achieve new state-of-the-art performance on the original Crafter environment. Additionally, when trained beyond 1M steps, our tuned agents can unlock almost all achievements. We show that the recurrent PPO agents improve over feedforward ones, even with the inventory information removed. We introduce CrafterOOD, a set of 15 new environments that evaluate OOD generalization. On CrafterOOD, we show that the current agents fail to generalize, whereas our novel object-centric agents achieve state-of-the-art OOD generalization while also being interpretable. Our code is public.
翻译:强化学习代理人必须超越其培训经验, 以往的工作主要侧重于相同的培训和评估环境。 从最近推出的2D开放世界生存游戏Crafter基准开始, 我们引入了一套新的环境, 适合评估某些代理人对先前看不见(数量)天体进行推广和快速适应的能力。 在Crafter, 代理在接受1M步骤培训时, 以未实现的成就数量( 如收集资源) 来评估未实现的成绩。 我们显示, 当前的代理人在努力推广, 引入新的目标中心中心化的代理人, 改进了强力基线。 我们还通过几个实验为Crafter的未来工作提供了普遍兴趣的重要见解。 我们显示, 仔细的超参数调整可以大大改善PPO基线代理人对先前看不见(数量)天体( 数量) 和 快速适应者( 时间), 利用库存显示, 我们的常规代理人( ) 显示, 我们的常规组织( ) 和常规组织( ) 显示, 常规组织( ) 的代理人( ) 显示, 我们的常规组织( ) 组织( ) 组织( ) ) 显示, 我们的常规组织( 组织( ) ) 显示, 常规组织( ) 组织( ) 的代理人) 显示, 我们的常规) 显示, 常规) 显示, 常规组织( 常规组织( 组织( ) ) 显示, 我们的) 常规) 组织( ) 显示, 我们的代理人对常规组织( ) 组织) 显示, 我们的) 显示, 常规组织( 。