Real-world tasks of interest are generally poorly defined by human-readable descriptions and have no pre-defined reward signals unless it is defined by a human designer. Conversely, data-driven algorithms are often designed to solve a specific, narrowly defined, task with performance metrics that drives the agent's learning. In this work, we present the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge: Learning from Human Feedback in Minecraft, which challenged participants to use human data to solve four tasks defined only by a natural language description and no reward function. Our approach uses the available human demonstration data to train an imitation learning policy for navigation and additional human feedback to train an image classifier. These modules, combined with an estimated odometry map, become a powerful state-machine designed to utilize human knowledge in a natural hierarchical paradigm. We compare this hybrid intelligence approach to both end-to-end machine learning and pure engineered solutions, which are then judged by human evaluators. Codebase is available at https://github.com/viniciusguigo/kairos_minerl_basalt.
翻译:人类可读描述通常对现实世界感兴趣的任务定义不甚明确,除非由人类设计师界定,否则没有预先界定的奖励信号。相反,数据驱动算法的设计往往旨在用能推动代理人学习的性能衡量标准解决具体、定义狭窄的任务。在这项工作中,我们提出了在2021年NeurIPS竞争MineRL BASALT挑战中获得最人性化的解决方案,在2021年NeurIPS竞争MineRL BASAL挑战中被授予最人性化的代理:从人类的反馈中学习,这要求参与者使用人类数据来解决只有自然语言描述和无报酬功能界定的四项任务。我们的方法是利用现有的人类演示数据来训练导航模拟学习政策和更多的人类反馈来训练一个图像分类师。这些模块与估计的odography地图相结合,成为一种强大的国家机器,目的是在自然等级范式中利用人类知识。我们将这种混合情报方法与终端机器学习和纯设计解决方案进行比较,然后由人类评价员来判断。代码库可在 https://github.com/vinciusguiusgo/kair_kair_bor_bas_bir_bir_bor_bor_bor_bral_bor_brus_bor_bor_bor_brus_bor_bs_bis_bis_bass_bass_bass_bus_t_t_t_t_t_bism_t_t_t_t_t_t_t_t_t_t_t_t_