One of the grand challenges of reinforcement learning is the ability to generalize to new tasks. However, general agents require a set of rich, diverse tasks to train on. Designing a `foundation environment' for such tasks is tricky -- the ideal environment would support a range of emergent phenomena, an expressive task space, and fast runtime. To take a step towards addressing this research bottleneck, this work presents Powderworld, a lightweight yet expressive simulation environment running directly on the GPU. Within Powderworld, two motivating challenges distributions are presented, one for world-modelling and one for reinforcement learning. Each contains hand-designed test tasks to examine generalization. Experiments indicate that increasing the environment's complexity improves generalization for world models and certain reinforcement learning agents, yet may inhibit learning in high-variance environments. Powderworld aims to support the study of generalization by providing a source of diverse tasks arising from the same core rules.
翻译:强化学习的重大挑战之一是推广新任务的能力。然而,一般的代理机构需要一系列丰富多样的任务来培训。设计这种任务的“基础环境”十分棘手,理想的环境会支持一系列突发现象、一个表达式任务空间和快速运行的时间。为了朝着解决这一研究瓶颈问题迈出一步,这项工作提出了Powderworld,这是一个直接在GPU上运行的轻量级但表现式模拟环境。在Powderworld,提出了两种激励性的挑战分布,一种是世界模型,另一种是强化学习。每个都包含手工设计的测试任务,以审查一般化。实验表明,增加环境的复杂性可以改善世界模型和某些强化学习工具的通用性,但可能抑制在高变异性环境中的学习。Powderworld的目的是通过提供由同一核心规则产生的不同任务的来源,支持普遍化研究。