利用使用实际系统应用的蒙特卡洛逐步估算模型政策搜索 (Model-Based Policy Search Using Monte Carlo Gradient Estimation with Real Systems Application) - 专知论文

会员服务 ·

0

估计/估计量 · 蒙特卡罗 · 策略搜索 · Performer · MoDELS ·

2021 年 5 月 3 日

Model-Based Policy Search Using Monte Carlo Gradient Estimation with Real Systems Application

翻译：利用使用实际系统应用的蒙特卡洛逐步估算模型政策搜索

Fabio Amadio,Alberto Dalla Libera,Riccardo Antonello,Daniel Nikovski,Ruggero Carli,Diego Romeres

from arxiv, Submitted to IEEE Transactions on Robotics

In this paper, we present a Model-Based Reinforcement Learning algorithm named Monte Carlo Probabilistic Inference for Learning COntrol (MC-PILCO). The algorithm relies on Gaussian Processes (GPs) to model the system dynamics and on a Monte Carlo approach to estimate the policy gradient. This defines a framework in which we ablate the choice of the following components: (i) the selection of the cost function, (ii) the optimization of policies using dropout, (iii) an improved data efficiency through the use of structured kernels in the GP models. The combination of the aforementioned aspects affects dramatically the performance of MC-PILCO. Numerical comparisons in a simulated cart-pole environment show that MC-PILCO exhibits better data-efficiency and control performance w.r.t. state-of-the-art GP-based MBRL algorithms. Finally, we apply MC-PILCO to real systems, considering in particular systems with partially measurable states. We discuss the importance of modeling both the measurement system and the state estimators during policy optimization. The effectiveness of the proposed solutions has been tested in simulation and in two real systems, a Furuta pendulum and a ball-and-plate.

翻译：在本文中,我们提出一个称为Monte Carlo Control(MC-PILCO)的基于模型的加强学习能力分析算法(MC-PILCO),该算法依靠Gossian processes(GPs)来模拟系统动态,依靠Monte Carlo方法来估计政策梯度。这个算法界定了一个框架,在这个框架内,我们减少以下组成部分的选择:(一) 选择成本功能,(二) 优化使用辍学法的政策,(三) 通过在GP模型中使用结构化核心提高数据效率。上述各方面的结合极大地影响了MC-PILCO的性能。模拟马车极环境中的数值比较表明,MC-PILCO的数据效率和控制性能得到更好的体现。最后,我们将MC-PILCO应用到实际系统,特别考虑到有部分可计量状态的系统。我们讨论了在政策优化期间对测量系统和州估测数据系统进行建模的重要性。提议的解决办法的有效性在实际压和制压中进行了两次测试。

0

相关内容

估计/估计量

估计/估计量

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

专知会员服务

22+阅读 · 2020年1月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Local policy search with Bayesian optimization

Arxiv

0+阅读 · 2021年6月22日

Discrepancy-based Inference for Intractable Generative Models using Quasi-Monte Carlo

Arxiv

0+阅读 · 2021年6月22日

Accelerated Policy Evaluation: Learning Adversarial Environments with Adaptive Importance Sampling

Arxiv

0+阅读 · 2021年6月19日

Combining Pseudo-Point and State Space Approximations for Sum-Separable Gaussian Processes

Combining Pseudo-Point and State Space Approximations for Sum-Separable Gaussian Processes

Arxiv

0+阅读 · 2021年6月18日

The ensemble Kalman filter for rare event estimation

Arxiv

0+阅读 · 2021年6月18日

Approximation Algorithms for Two-Bar Charts Packing Problem

Arxiv

0+阅读 · 2021年6月18日

Variational Combinatorial Sequential Monte Carlo Methods for Bayesian Phylogenetic Inference

Arxiv

0+阅读 · 2021年6月17日

Generalized regression operator estimation for continuous time functional data processes with missing at random response

Arxiv

0+阅读 · 2021年6月17日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Graph-Based Recommendation System

Graph-Based Recommendation System

Arxiv

4+阅读 · 2018年7月31日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

专知会员服务

22+阅读 · 2020年1月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Local policy search with Bayesian optimization

Arxiv

0+阅读 · 2021年6月22日

Discrepancy-based Inference for Intractable Generative Models using Quasi-Monte Carlo

Arxiv

0+阅读 · 2021年6月22日

Accelerated Policy Evaluation: Learning Adversarial Environments with Adaptive Importance Sampling

Arxiv

0+阅读 · 2021年6月19日

Combining Pseudo-Point and State Space Approximations for Sum-Separable Gaussian Processes

Combining Pseudo-Point and State Space Approximations for Sum-Separable Gaussian Processes

Arxiv

0+阅读 · 2021年6月18日

The ensemble Kalman filter for rare event estimation

Arxiv

0+阅读 · 2021年6月18日

Approximation Algorithms for Two-Bar Charts Packing Problem

Arxiv

0+阅读 · 2021年6月18日

Variational Combinatorial Sequential Monte Carlo Methods for Bayesian Phylogenetic Inference

Arxiv

0+阅读 · 2021年6月17日

Generalized regression operator estimation for continuous time functional data processes with missing at random response

Arxiv

0+阅读 · 2021年6月17日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Graph-Based Recommendation System

Graph-Based Recommendation System

Arxiv

4+阅读 · 2018年7月31日

微信扫码咨询专知VIP会员