条约核心行动空间的有效规划 (Efficient Planning in a Compact Latent Action Space) - 专知论文

会员服务 ·

0

Continuity · TAP · 潜在 · VQ-VAE · 离散化 ·

2023 年 1 月 24 日

Efficient Planning in a Compact Latent Action Space

翻译：条约核心行动空间的有效规划

Zhengyao Jiang,Tianjun Zhang,Michael Janner,Yueying Li,Tim Rocktäschel,Edward Grefenstette,Yuandong Tian

from arxiv, Accepted by ICLR2023. Code available at https://github.com/ZhengyaoJiang/latentplan

Planning-based reinforcement learning has shown strong performance in tasks in discrete and low-dimensional continuous action spaces. However, planning usually brings significant computational overhead for decision-making, and scaling such methods to high-dimensional action spaces remains challenging. To advance efficient planning for high-dimensional continuous control, we propose Trajectory Autoencoding Planner (TAP), which learns low-dimensional latent action codes with a state-conditional VQ-VAE. The decoder of the VQ-VAE thus serves as a novel dynamics model that takes latent actions and current state as input and reconstructs long-horizon trajectories. During inference time, given a starting state, TAP searches over discrete latent actions to find trajectories that have both high probability under the training distribution and high predicted cumulative reward. Empirical evaluation in the offline RL setting demonstrates low decision latency which is indifferent to the growing raw action dimensionality. For Adroit robotic hand manipulation tasks with high-dimensional continuous action space, TAP surpasses existing model-based methods by a large margin and also beats strong model-free actor-critic baselines.

翻译：以规划为基础的强化学习在离散和低维持续行动空间的任务中表现良好。然而,规划通常为决策带来大量的计算间接费用,并将这些方法推广到高维行动空间仍然具有挑战性。为了推进高维连续控制的有效规划,我们提议Tapotory Autencoding Planner(TAP),该规划用州立条件VQ-VAE学习低维潜伏行动代码。VQ-VAE的解码器因此作为一种新颖的动态模型,以潜在行动和当前状态作为输入并重建长正方位轨道。在推断期间,考虑到一个起始状态,TAP对离散潜在行动进行搜索,以寻找在培训分布和高预期累积奖励下具有高度概率的轨迹。在离线 RL 设置下进行的经验性评估表明,低决策惯性与日益增强的原始行动维度不相干。对于高维度的田径机器人操纵任务来说,TAP以大型边距比强的模型基基线超过现有的模型。

0

相关内容

Continuity

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

专知会员服务

166+阅读 · 2022年4月10日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

SDF-1/CXCR7轴在3D培养的间充质干细胞向缺血心肌迁徙中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

多时相InSAR电离层探测和延迟改正的理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

Anisomycin抑制T细胞的机制、应用及靶点

国家自然科学基金

0+阅读 · 2011年12月31日

新型选择性CDKs抑制剂的设计、合成与生物活性研究

国家自然科学基金

0+阅读 · 2009年12月31日

过渡族金属氧化物中电阻开关特性的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Arxiv

1+阅读 · 2023年3月16日

Transformer-based Planning for Symbolic Regression

Arxiv

0+阅读 · 2023年3月16日

Efficient Planning of Multi-Robot Collective Transport using Graph Reinforcement Learning with Higher Order Topological Abstraction

Arxiv

0+阅读 · 2023年3月15日

Efficient Solution of Bimaterial Riemann Problems for Compressible Multi-Material Flow Simulations

Arxiv

0+阅读 · 2023年3月15日

Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression

Arxiv

0+阅读 · 2023年3月15日

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

Arxiv

0+阅读 · 2023年3月14日

An optimal transport regularized model to image reconstruction problems

Arxiv

0+阅读 · 2023年3月14日

Task and Motion Planning with Large Language Models for Object Rearrangement

Arxiv

0+阅读 · 2023年3月14日

Leveraging Demonstrations with Latent Space Priors

Arxiv

0+阅读 · 2023年3月13日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

VIP会员

文章信息

相关主题

相关VIP内容

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

【硬核书】规划算法 (Planning Algorithm)，1023页pdf，Steven M. Illinois大学

专知会员服务

166+阅读 · 2022年4月10日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Arxiv

1+阅读 · 2023年3月16日

Transformer-based Planning for Symbolic Regression

Arxiv

0+阅读 · 2023年3月16日

Efficient Planning of Multi-Robot Collective Transport using Graph Reinforcement Learning with Higher Order Topological Abstraction

Arxiv

0+阅读 · 2023年3月15日

Efficient Solution of Bimaterial Riemann Problems for Compressible Multi-Material Flow Simulations

Arxiv

0+阅读 · 2023年3月15日

Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression

Arxiv

0+阅读 · 2023年3月15日

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

Arxiv

0+阅读 · 2023年3月14日

An optimal transport regularized model to image reconstruction problems

Arxiv

0+阅读 · 2023年3月14日

Task and Motion Planning with Large Language Models for Object Rearrangement

Arxiv

0+阅读 · 2023年3月14日

Leveraging Demonstrations with Latent Space Priors

Arxiv

0+阅读 · 2023年3月13日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

相关基金

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

SDF-1/CXCR7轴在3D培养的间充质干细胞向缺血心肌迁徙中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

多时相InSAR电离层探测和延迟改正的理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

Anisomycin抑制T细胞的机制、应用及靶点

国家自然科学基金

0+阅读 · 2011年12月31日

新型选择性CDKs抑制剂的设计、合成与生物活性研究

国家自然科学基金

0+阅读 · 2009年12月31日

过渡族金属氧化物中电阻开关特性的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员