As artificial intelligence research advances, the platforms used to evaluate AI agents need to adapt and grow to continue to challenge them. We present the Polycraft World AI Lab (PAL), a task simulator with an API based on the Minecraft mod Polycraft World. Our platform is built to allow AI agents with different architectures to easily interact with the Minecraft world, train and be evaluated in multiple tasks. PAL enables the creation of tasks in a flexible manner as well as having the capability to manipulate any aspect of the task during an evaluation. All actions taken by AI agents and external actors (non-player-characters, NPCs) in the open-world environment are logged to streamline evaluation. Here we present two custom tasks on the PAL platform, one focused on multi-step planning and one focused on navigation, and evaluations of agents solving them. In summary, we report a versatile and extensible AI evaluation platform with a low barrier to entry for AI researchers to utilize.
翻译:随着人工智能研究的进展,用于评价AI代理商的平台需要适应并发展壮大,以继续挑战它们。我们展示了基于Minecraft Mod Collycraft World(Minecraft Mod Collycraft World),一个任务模拟器,一个任务模拟器,一个任务模拟器,一个任务模拟器,一个基于Minecraft Mod Corprocraft World(Minecraft Mod Corporation World),我们的平台的建设使具有不同结构的AI代理商能够方便地与Minecraft World(Minecraft)互动、培训和评估多重任务。PAL使任务能够灵活地创造任务,并在评估过程中能够操纵任务的任何方面。AAL代理商和外部行为者(非角色、NPCs)在开放世界环境中采取的所有行动都被记录下来,以简化评估。在这里,我们展示了PAL平台上的两项定制任务,一个侧重于多步式规划,一个侧重于导航,一个重点评估,一个任务解决代理商解决问题,以及代理商的估价。概括,我们报告一个灵活和可操作的可操作的人工研究人员进入低障碍。