以模型为基础的离线规划与轨迹环流 (Model-Based Offline Planning with Trajectory Pruning) - 专知论文

会员服务 ·

0

剪枝 · 学成 · Performer · 控制器 · Performance ·

2021 年 5 月 16 日

Model-Based Offline Planning with Trajectory Pruning

翻译：以模型为基础的离线规划与轨迹环流

Xianyuan Zhan,Xiangyu Zhu,Haoran Xu

Offline reinforcement learning (RL) enables learning policies using pre-collected datasets without environment interaction, which provides a promising direction to make RL useable in real-world systems. Although recent offline RL studies have achieved much progress, existing methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. Model-based planning framework provides an attractive solution for such tasks. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. MOPP encourages more aggressive trajectory rollout guided by the behavior policy learned from data, and prunes out problematic trajectories to avoid potential out-of-distribution samples. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches, and allows easy adaptation to varying objectives and extra constraints.

翻译：离线强化学习(RL)使学习政策能够在没有环境互动的情况下使用预先收集的数据集,这为在现实世界系统中使用RL提供了很有希望的方向。虽然最近的离线RL研究取得了很大进展,但现有方法在现实世界系统控制任务中仍面临许多实际挑战,如代理培训中的计算限制和额外控制灵活性的要求。基于模型的规划框架为此类任务提供了一个有吸引力的解决办法。然而,大多数基于模型的规划算法并不是为离线环境设计出来的。仅仅将离线RL的成分与现有方法结合起来,要么提供过度规划,要么导致低效性能。我们提出了一个新的轻量的基于模型的离线规划框架,即MOPP,它解决了离线学习限制和高性能规划之间的两难境地。MOP鼓励在从数据中学习的行为政策指导下更积极地推出轨迹,并排除有问题的轨迹可避免潜在的分流样本。实验结果表明,MOP提供了与现有基于模型的离线规划和RL方法相比具有竞争力的业绩,并且容易地适应不同的目标和额外限制。

0

相关内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【ECML-PKDD 2019】用于处理多维语义轨迹和预测未来语义位置的多通道卷积神经网络（Multi-Channel Convolutional Neural Networks for Handling Multi-Dimensional Semantic Trajectories and Predicting Future Semantic Locations）

【ECML-PKDD 2019】用于处理多维语义轨迹和预测未来语义位置的多通道卷积神经网络（Multi-Channel Convolutional Neural Networks for Handling Multi-Dimensional Semantic Trajectories and Predicting Future Semantic Locations）

专知会员服务

7+阅读 · 2019年12月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Search-based Path Planning for a High Dimensional Manipulator in Cluttered Environments Using Optimization-based Primitives

Arxiv

0+阅读 · 2021年7月6日

MissFormer: (In-)attention-based handling of missing observations for trajectory filtering and prediction

MissFormer: (In-)attention-based handling of missing observations for trajectory filtering and prediction

Arxiv

0+阅读 · 2021年7月6日

Real-Time Motion Planning of a Hydraulic Excavator using Trajectory Optimization and Model Predictive Control

Arxiv

0+阅读 · 2021年7月6日

EVA-Planner: Environmental Adaptive Quadrotor Planning

EVA-Planner: Environmental Adaptive Quadrotor Planning

Arxiv

0+阅读 · 2021年7月5日

MSN: Multi-Style Network for Trajectory Prediction

Arxiv

0+阅读 · 2021年7月2日

Path Planning using Neural A* Search

Arxiv

5+阅读 · 2021年2月8日

Efficiently Embedding Dynamic Knowledge Graphs

Efficiently Embedding Dynamic Knowledge Graphs

Arxiv

14+阅读 · 2019年10月15日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【ECML-PKDD 2019】用于处理多维语义轨迹和预测未来语义位置的多通道卷积神经网络（Multi-Channel Convolutional Neural Networks for Handling Multi-Dimensional Semantic Trajectories and Predicting Future Semantic Locations）

【ECML-PKDD 2019】用于处理多维语义轨迹和预测未来语义位置的多通道卷积神经网络（Multi-Channel Convolutional Neural Networks for Handling Multi-Dimensional Semantic Trajectories and Predicting Future Semantic Locations）

专知会员服务

7+阅读 · 2019年12月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Search-based Path Planning for a High Dimensional Manipulator in Cluttered Environments Using Optimization-based Primitives

Arxiv

0+阅读 · 2021年7月6日

MissFormer: (In-)attention-based handling of missing observations for trajectory filtering and prediction

MissFormer: (In-)attention-based handling of missing observations for trajectory filtering and prediction

Arxiv

0+阅读 · 2021年7月6日

Real-Time Motion Planning of a Hydraulic Excavator using Trajectory Optimization and Model Predictive Control

Arxiv

0+阅读 · 2021年7月6日

EVA-Planner: Environmental Adaptive Quadrotor Planning

EVA-Planner: Environmental Adaptive Quadrotor Planning

Arxiv

0+阅读 · 2021年7月5日

MSN: Multi-Style Network for Trajectory Prediction

Arxiv

0+阅读 · 2021年7月2日

Path Planning using Neural A* Search

Arxiv

5+阅读 · 2021年2月8日

Efficiently Embedding Dynamic Knowledge Graphs

Efficiently Embedding Dynamic Knowledge Graphs

Arxiv

14+阅读 · 2019年10月15日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员