Evaluating the general abilities of intelligent agents requires complex simulation environments. Existing benchmarks typically evaluate only one narrow task per environment, requiring researchers to perform expensive training runs on many different environments. We introduce Crafter, an open world survival game with visual inputs that evaluates a wide range of general abilities within a single environment. Agents either learn from the provided reward signal or through intrinsic objectives and are evaluated by semantically meaningful achievements that can be unlocked during each episode, such as discovering resources and crafting tools. Consistently unlocking all achievements requires strong generalization, deep exploration, and long-term reasoning. We experimentally verify that Crafter is of appropriate difficulty to drive future research and provide baselines scores of reward agents and unsupervised agents. Furthermore, we observe sophisticated behaviors emerging from maximizing the reward signal, such as building tunnel systems, bridges, houses, and plantations. We hope that Crafter will accelerate research progress by quickly evaluating a wide spectrum of abilities.
翻译:评估智能剂的一般能力需要复杂的模拟环境。现有基准通常只评估一种狭隘的环境任务,要求研究人员在许多不同环境中进行昂贵的培训。我们引入了Crafter,这是一个开放的世界生存游戏,有视觉投入,在单一环境中评价广泛的一般能力。代理要么从所提供的奖励信号中学习,要么通过内在目标学习,并用每个事件可以解开的具有内在意义的成就来评估,例如发现资源和制造工具。始终如一地解开所有成就需要强有力的概括、深入的探索和长期的推理。我们实验性地核实Crafter在推动未来研究以及提供奖励剂和不受监督的代理人的基线分数方面存在适当的困难。此外,我们观察从最大程度的奖励信号中产生的尖端行为,例如建造隧道系统、桥梁、房屋和种植园。我们希望Crafter将快速地评估各种能力,从而加速研究进展。