JueWu-MC:利用采样效率高的等级强化学习来利用采矿工艺 (JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning)

from arxiv, The champion solution of NeurIPS 2021 MineRL research competition ( https://www.aicrowd.com/challenges/neurips-2021-minerl-diamond-competition/leaderboards )

Learning rational behaviors in open-world games like Minecraft remains to be challenging for Reinforcement Learning (RL) research due to the compound challenge of partial observability, high-dimensional visual perception and delayed reward. To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration. Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task. To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for policy robustness. Extensive experiments show that JueWu-MC significantly improves sample efficiency and outperforms a set of baselines by a large margin. Notably, we won the championship of the NeurIPS MineRL 2021 research competition and achieved the highest performance score ever.

翻译：在开放世界的游戏中,如Minecraft等,学习理性行为对于加强学习(RL)研究来说仍然是一项挑战,因为部分可观察性、高维视觉感知和延迟奖励等多重挑战。为了解决这个问题,我们建议JueWu-MC, 一种具有代表性学习和模拟学习的样本效率级级RL方法,配有代表性学习和模拟学习,以处理感知和探索。具体地说,我们的方法包括两个层次的层次,即高层控制者学习控制选项的政策,低层次工人学习解决每个子任务。为了促进子任务学习,我们提议将各种技术结合起来,包括:(1) 行动意识代表学习,该学习捕捉到行动和代表之间的内在关系,(2) 以歧视为主的自我计量学习,以有效探索,(3) 共同进行行为克隆,同时过滤政策的稳健性。广泛的实验表明,JueWu-MC大大地提高了样本效率,超越了一套基准。值得注意的是,我们赢得了NeurIPS MineRL 2021研究竞赛的冠军,并取得了最高的业绩分数。