目标空间规划与次级目标模式 (Goal-Space Planning with Subgoal Models) - 专知论文

会员服务 ·

0

Learning · MoDELS · DQN · dynamic programming · 情景 ·

2022 年 11 月 1 日

Goal-Space Planning with Subgoal Models

翻译：目标空间规划与次级目标模式

Chunlok Lo,Gabor Mihucz,Adam White,Farzane Aminmansour,Martha White

This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can learn significantly faster than a Double DQN baseline in a variety of situations.

翻译：本文探讨利用背景规划进行基于模型的强化学习的新方法:将动态方案更新和无模型更新(类似于Dyna结构)混合(近似)动态方案更新和无模型更新,与所学模型的背景规划往往比不使用模型的替代方法(例如双重数字QN)更糟糕,即使前者使用的记忆和计算量要多得多。根本问题是,所学模型可能不准确,往往产生无效状态,特别是在迭代许多步骤时。在本文件中,我们避免了这一限制,将背景规划限制在一套(抽象的)次级目标和仅学习本地的次级目标限制模式上。这种目标空间规划方法在计算上效率更高,自然包含时间抽象,以便更快地进行长程规划,并避免完全学习转型动态。我们表明,我们的普惠制算法在各种情况下的学习速度可以大大快于双重数字N基线。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

Span衍生物的物理化学特性与木质纤维素高效酶解的关系

国家自然科学基金

0+阅读 · 2015年12月31日

溶剂热法FeSe基超导材料制备和物性研究

国家自然科学基金

0+阅读 · 2014年12月31日

新疆独山子地区挥发性有机物的组成、来源及对二次污染物的贡献

国家自然科学基金

0+阅读 · 2014年12月31日

Cu基金属纳米晶：结构调控和催化Ullmann反应研究

国家自然科学基金

0+阅读 · 2014年12月31日

Beclin 1在阿尔茨海默病样神经元损伤中的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

重离子束辐照诱发阿维菌素生物代谢功能变化的机理研究；

国家自然科学基金

0+阅读 · 2013年12月31日

纳米氧化铜催化C-H官能团化合成杂环衍生物及催化机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

负载碳多孔有机插层LDHs的组装及对氯酚的增强吸附机理与选择性

国家自然科学基金

0+阅读 · 2012年12月31日

碳质页岩镓、稀土元素富集及赋存状态研究

国家自然科学基金

0+阅读 · 2011年12月31日

SLGTformer: An Attention-Based Approach to Sign Language Recognition

Arxiv

0+阅读 · 2022年12月21日

Planning with Diffusion for Flexible Behavior Synthesis

Arxiv

0+阅读 · 2022年12月21日

Confidently Comparing Estimators with the c-value

Arxiv

0+阅读 · 2022年12月19日

The Metric Space of Networks

Arxiv

0+阅读 · 2022年12月18日

Comparison of Model-Free and Model-Based Learning-Informed Planning for PointGoal Navigation

Arxiv

0+阅读 · 2022年12月17日

Planning Visual-Tactile Precision Grasps via Complementary Use of Vision and Touch

Arxiv

0+阅读 · 2022年12月16日

Context-aware Fine-tuning of Self-supervised Speech Models

Arxiv

0+阅读 · 2022年12月16日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space

Arxiv

11+阅读 · 2019年2月26日

VIP会员

文章信息

相关主题

dynamic programming

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

SLGTformer: An Attention-Based Approach to Sign Language Recognition

Arxiv

0+阅读 · 2022年12月21日

Planning with Diffusion for Flexible Behavior Synthesis

Arxiv

0+阅读 · 2022年12月21日

Confidently Comparing Estimators with the c-value

Arxiv

0+阅读 · 2022年12月19日

The Metric Space of Networks

Arxiv

0+阅读 · 2022年12月18日

Comparison of Model-Free and Model-Based Learning-Informed Planning for PointGoal Navigation

Arxiv

0+阅读 · 2022年12月17日

Planning Visual-Tactile Precision Grasps via Complementary Use of Vision and Touch

Arxiv

0+阅读 · 2022年12月16日

Context-aware Fine-tuning of Self-supervised Speech Models

Arxiv

0+阅读 · 2022年12月16日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space

Arxiv

11+阅读 · 2019年2月26日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

Span衍生物的物理化学特性与木质纤维素高效酶解的关系

国家自然科学基金

0+阅读 · 2015年12月31日

溶剂热法FeSe基超导材料制备和物性研究

国家自然科学基金

0+阅读 · 2014年12月31日

新疆独山子地区挥发性有机物的组成、来源及对二次污染物的贡献

国家自然科学基金

0+阅读 · 2014年12月31日

Cu基金属纳米晶：结构调控和催化Ullmann反应研究

国家自然科学基金

0+阅读 · 2014年12月31日

Beclin 1在阿尔茨海默病样神经元损伤中的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

重离子束辐照诱发阿维菌素生物代谢功能变化的机理研究；

国家自然科学基金

0+阅读 · 2013年12月31日

纳米氧化铜催化C-H官能团化合成杂环衍生物及催化机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

负载碳多孔有机插层LDHs的组装及对氯酚的增强吸附机理与选择性

国家自然科学基金

0+阅读 · 2012年12月31日

碳质页岩镓、稀土元素富集及赋存状态研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员