Autonomous agents have made great strides in specialist domains like Atari games and Go. However, they typically learn tabula rasa in isolated environments with limited and manually conceived objectives, thus failing to generalize across a wide spectrum of tasks and capabilities. Inspired by how humans continually learn and adapt in the open world, we advocate a trinity of ingredients for building generalist agents: 1) an environment that supports a multitude of tasks and goals, 2) a large-scale database of multimodal knowledge, and 3) a flexible and scalable agent architecture. We introduce MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and forum discussions. Using MineDojo's data, we propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function. Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward. We open-source the simulation suite, knowledge bases, algorithm implementation, and pretrained models (https://minedojo.org) to promote research towards the goal of generally capable embodied agents.
翻译:自主代理商在Atari游戏和Go等专业领域取得了长足进步。然而,他们通常在孤立的环境中学习塔普拉拉马萨,其目标有限,而且人工设计,因此无法推广广泛的任务和能力。受人类如何在开放世界中不断学习和适应的启发,我们提倡建立通用代理商的三重元素:1)一个支持多种任务和目标的环境,2)一个大型多式联运知识数据库,3)一个灵活和可扩缩的代理商结构。我们引入了MineDojo,这是一个在流行的采矿手工艺游戏上建立的新框架,其特点是一个模拟套装,有数千个不同的开放式任务,以及一个互联网规模的知识库,包括地雷手工艺视频、辅导、维基页面和论坛讨论。我们利用MineDojo的数据,提出一种新的代理商学习算法,利用大量预先培训过的视频语言模型作为学习的学习奖励功能。我们的代理商能够解决各种在自由形式语言中具体规定的开放式任务,而无需任何人工设计的密集塑造奖赏。我们向模拟套、知识库、算法基础和预设能力模型(https://dojojododododologings)推广。