We held the first-ever MineRL Benchmark for Agents that Solve Almost-Lifelike Tasks (MineRL BASALT) Competition at the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021). The goal of the competition was to promote research towards agents that use learning from human feedback (LfHF) techniques to solve open-world tasks. Rather than mandating the use of LfHF techniques, we described four tasks in natural language to be accomplished in the video game Minecraft, and allowed participants to use any approach they wanted to build agents that could accomplish the tasks. Teams developed a diverse range of LfHF algorithms across a variety of possible human feedback types. The three winning teams implemented significantly different approaches while achieving similar performance. Interestingly, their approaches performed well on different tasks, validating our choice of tasks to include in the competition. While the outcomes validated the design of our competition, we did not get as many participants and submissions as our sister competition, MineRL Diamond. We speculate about the causes of this problem and suggest improvements for future iterations of the competition.
翻译:我们在第三十五届神经信息处理系统会议(NeurIPS 2021)上为解决几乎生命任务(MineRL BASALT)竞争的代理人制定了有史以来第一个MineRL基准,目的是促进研究那些利用人类反馈(LfHF)技术学习解决开放世界任务的代理人。我们没有授权使用LfHF技术,而是用自然语言描述了在视频游戏“Minecraft”中要完成的四项任务,并允许参与者使用他们想要建立能够完成任务的代理人的任何方法。团队在各种可能的人类反馈类型中开发了多种LfHF算法。三个赢家团队在取得类似业绩的同时采用了截然不同的方法。有趣的是,他们的方法在不同的任务上表现良好,证实了我们选择了在竞争中包括的任务。虽然结果证实了我们的竞争设计,但我们没有像我们的姐妹竞争对手MineRL Diamont一样得到更多的参与者和提交材料。我们推测了这个问题的原因,并且建议改进竞争的未来。