Plan4MC: 使用技能强化学习和规划解决开放世界的 Minecraft 任务 (Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks) - 专知论文

会员服务 ·

0

开放世界 · 智能体 · 强化学习 · 构建 · 任务分解 ·

2023 年 3 月 29 日

Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks

翻译：Plan4MC: 使用技能强化学习和规划解决开放世界的 Minecraft 任务

Haoqi Yuan,Chi Zhang,Hongcheng Wang,Feiyang Xie,Penglin Cai,Hao Dong,Zongqing Lu

from arxiv, 19 pages

We study building a multi-task agent in Minecraft. Without human demonstrations, solving long-horizon tasks in this open-ended environment with reinforcement learning (RL) is extremely sample inefficient. To tackle the challenge, we decompose solving Minecraft tasks into learning basic skills and planning over the skills. We propose three types of fine-grained basic skills in Minecraft, and use RL with intrinsic rewards to accomplish basic skills with high success rates. For skill planning, we use Large Language Models to find the relationships between skills and build a skill graph in advance. When the agent is solving a task, our skill search algorithm walks on the skill graph and generates the proper skill plans for the agent. In experiments, our method accomplishes 24 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills. Our method outperforms baselines in most tasks by a large margin. The project's website and code can be found at https://sites.google.com/view/plan4mc.

翻译：我们研究了在 Minecraft 中构建多任务智能体。在没有人类演示的情况下，使用强化学习 (RL) 解决这个开放式环境下的长时间任务的样本效率极低。为了解决这个挑战，我们将解决 Minecraft 任务分解为学习基本技能和规划技能。我们在 Minecraft 中提出了三类细粒度基本技能，并使用具有内在奖励的 RL 来高成功率地完成基本技能。对于技能规划，我们使用大型语言模型找到技能之间的关系并提前构建一个技能图。当智能体解决任务时，我们的技能搜索算法在技能图上行走并为智能体生成合适的技能计划。在实验中，我们的方法完成了 24 种多样化的 Minecraft 任务，其中许多任务需要按照超过 10 个技能的顺序执行。我们的方法在大多数任务中表现比基线方法效果更好。该项目的网站和代码可以在 https://sites.google.com/view/plan4mc 找到。

1

相关内容

开放世界

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

专知会员服务

58+阅读 · 2022年12月10日

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

专知会员服务

118+阅读 · 2022年5月7日

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

专知会员服务

231+阅读 · 2022年4月10日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

专知会员服务

38+阅读 · 2020年6月3日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

22篇论文！增量学习/终生学习论文资源列表

22篇论文！增量学习/终生学习论文资源列表

专知

32+阅读 · 2018年12月27日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

41+阅读 · 2015年12月31日

基于姿态图及场景描述的服务机器人长期作业环境感知方法

国家自然科学基金

0+阅读 · 2014年12月31日

面向交会对接的航天器推力器配置问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

利用西藏羊八井ARGO实验研究宇宙线与大气电场的关联

国家自然科学基金

0+阅读 · 2011年12月31日

知识驱动的多目标决策数据挖掘理论框架及应用实验系统研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于上海EBIT装置的激光离子源研制和相关等离子体研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于Meta-Agent交互链的作战系统建模研究

国家自然科学基金

8+阅读 · 2009年12月31日

基于谓词规划树的规划方法的研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于多智能体强化学习的多机器人系统研究

国家自然科学基金

48+阅读 · 2009年12月31日

基于新型导航方案的非合作自主交会轨道动力学与制导律研究

国家自然科学基金

1+阅读 · 2008年12月31日

LLM Itself Can Read and Generate CXR Images

Arxiv

0+阅读 · 2023年5月19日

Understanding the World to Solve Social Dilemmas Using Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月19日

Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月18日

Motion Planning (In)feasibility Detection using a Prior Roadmap via Path and Cut Search

Motion Planning (In)feasibility Detection using a Prior Roadmap via Path and Cut Search

Arxiv

0+阅读 · 2023年5月17日

Explainable Multi-Agent Reinforcement Learning for Temporal Queries

Explainable Multi-Agent Reinforcement Learning for Temporal Queries

Arxiv

0+阅读 · 2023年5月17日

Can Language Models Solve Graph Problems in Natural Language?

Arxiv

0+阅读 · 2023年5月17日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

专知会员服务

58+阅读 · 2022年12月10日

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

专知会员服务

118+阅读 · 2022年5月7日

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

专知会员服务

231+阅读 · 2022年4月10日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

专知会员服务

38+阅读 · 2020年6月3日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

22篇论文！增量学习/终生学习论文资源列表

22篇论文！增量学习/终生学习论文资源列表

专知

32+阅读 · 2018年12月27日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

LLM Itself Can Read and Generate CXR Images

Arxiv

0+阅读 · 2023年5月19日

Understanding the World to Solve Social Dilemmas Using Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月19日

Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月18日

Motion Planning (In)feasibility Detection using a Prior Roadmap via Path and Cut Search

Motion Planning (In)feasibility Detection using a Prior Roadmap via Path and Cut Search

Arxiv

0+阅读 · 2023年5月17日

Explainable Multi-Agent Reinforcement Learning for Temporal Queries

Explainable Multi-Agent Reinforcement Learning for Temporal Queries

Arxiv

0+阅读 · 2023年5月17日

Can Language Models Solve Graph Problems in Natural Language?

Arxiv

0+阅读 · 2023年5月17日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

相关基金

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

41+阅读 · 2015年12月31日

基于姿态图及场景描述的服务机器人长期作业环境感知方法

国家自然科学基金

0+阅读 · 2014年12月31日

面向交会对接的航天器推力器配置问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

利用西藏羊八井ARGO实验研究宇宙线与大气电场的关联

国家自然科学基金

0+阅读 · 2011年12月31日

知识驱动的多目标决策数据挖掘理论框架及应用实验系统研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于上海EBIT装置的激光离子源研制和相关等离子体研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于Meta-Agent交互链的作战系统建模研究

国家自然科学基金

8+阅读 · 2009年12月31日

基于谓词规划树的规划方法的研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于多智能体强化学习的多机器人系统研究

国家自然科学基金

48+阅读 · 2009年12月31日

基于新型导航方案的非合作自主交会轨道动力学与制导律研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员