机器人操纵技能的非政策优化 (Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills) - 专知论文

会员服务 ·

0

优化器 · 参数空间 · 学成 · Extensibility · 机器人 ·

2022 年 2 月 11 日

Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills

翻译：机器人操纵技能的非政策优化

Samuele Tosatto,Georgia Chalvatzaki,Jan Peters

Parameterized movement primitives have been extensively used for imitation learning of robotic tasks. However, the high-dimensionality of the parameter space hinders the improvement of such primitives in the reinforcement learning (RL) setting, especially for learning with physical robots. In this paper we propose a novel view on handling the demonstrated trajectories for acquiring low-dimensional, non-linear latent dynamics, using mixtures of probabilistic principal component analyzers (MPPCA) on the movements' parameter space. Moreover, we introduce a new contextual off-policy RL algorithm, named LAtent-Movements Policy Optimization (LAMPO). LAMPO can provide gradient estimates from previous experience using self-normalized importance sampling, hence, making full use of samples collected in previous learning iterations. These advantages combined provide a complete framework for sample-efficient off-policy optimization of movement primitives for robot learning of high-dimensional manipulation skills. Our experimental results conducted both in simulation and on a real robot show that LAMPO provides sample-efficient policies against common approaches in literature.

翻译：参数空间的高维性妨碍了在强化学习(RL)设置中改进这类原始,特别是用于与物理机器人学习。在本文件中,我们提出了关于如何处理为获得低维、非线性潜伏动态而演示的轨迹的新观点,在运动的参数空间中使用概率主元件分析器混合物(MPPCA)进行运动。此外,我们引入了一种新的环境脱政策RL算法,名为Latentent-Movements Policy Opptimization(LAMPO ) 。LAMPO可以提供从以往经验中利用自我标准化重要性取样得出的梯度估计数,从而充分利用以往学习迭代中收集的样本。这些优势合在一起提供了一个完整的框架,为机器人学习高维操纵技能而以抽样高效的非政策方式优化运动原始。我们在模拟和真实机器人上进行的实验结果显示,LAMPO提供了针对文献中共同方法的样本高效政策。

0

相关内容

优化器

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【ACML2020】张量网络机器学习:最近的进展和前沿，109页ppt

【ACML2020】张量网络机器学习:最近的进展和前沿，109页ppt

专知会员服务

55+阅读 · 2020年12月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

281+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于结构约束的多模态学习理论和方法

国家自然科学基金

6+阅读 · 2014年12月31日

TSLP联结角膜真菌感染天然免疫与获得性免疫的机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于信道编译码的压缩感知研究

国家自然科学基金

1+阅读 · 2012年12月31日

利用参量结构实现复杂信号环境下盲信号分离方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

MANETs网络环境下的异构移动多机器人协同控制系统任务分配机制的研究

国家自然科学基金

1+阅读 · 2012年12月31日

核桃内生真菌多样性及活性菌株发酵产物抑菌机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

集群环境下复杂结构非线性动力有限元并行求解算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高压下卤族元素材料的结构与物性研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于因果图的一致性规划研究

国家自然科学基金

2+阅读 · 2011年12月31日

超过程及相关SPDE的研究

国家自然科学基金

0+阅读 · 2008年12月31日

GAM(e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints

Arxiv

0+阅读 · 2022年4月19日

Simulating Interaction Movements via Model Predictive Control

Arxiv

0+阅读 · 2022年4月19日

When Is Partially Observable Reinforcement Learning Not Scary?

Arxiv

0+阅读 · 2022年4月19日

Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation

Arxiv

0+阅读 · 2022年4月19日

Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

R3M: A Universal Visual Representation for Robot Manipulation

Arxiv

0+阅读 · 2022年4月18日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

PAC-Bayesian Based Adaptation for Regularized Learning

Arxiv

1+阅读 · 2022年4月16日

Divide & Conquer Imitation Learning

Arxiv

0+阅读 · 2022年4月15日

Sublinear Time Spectral Density Estimation

Arxiv

0+阅读 · 2022年4月14日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【ACML2020】张量网络机器学习:最近的进展和前沿，109页ppt

【ACML2020】张量网络机器学习:最近的进展和前沿，109页ppt

专知会员服务

55+阅读 · 2020年12月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

281+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】在低维和高维空间中分析、建模和转换潜在表征

从无人机到数据：揭示边缘计算作为新作战域

可解释人工智能的基础

大规模视觉模型中的基于提示的适应：综述

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

GAM(e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints

Arxiv

0+阅读 · 2022年4月19日

Simulating Interaction Movements via Model Predictive Control

Arxiv

0+阅读 · 2022年4月19日

When Is Partially Observable Reinforcement Learning Not Scary?

Arxiv

0+阅读 · 2022年4月19日

Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation

Arxiv

0+阅读 · 2022年4月19日

Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

R3M: A Universal Visual Representation for Robot Manipulation

Arxiv

0+阅读 · 2022年4月18日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

PAC-Bayesian Based Adaptation for Regularized Learning

Arxiv

1+阅读 · 2022年4月16日

Divide & Conquer Imitation Learning

Arxiv

0+阅读 · 2022年4月15日

Sublinear Time Spectral Density Estimation

Arxiv

0+阅读 · 2022年4月14日

相关基金

基于结构约束的多模态学习理论和方法

国家自然科学基金

6+阅读 · 2014年12月31日

TSLP联结角膜真菌感染天然免疫与获得性免疫的机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于信道编译码的压缩感知研究

国家自然科学基金

1+阅读 · 2012年12月31日

利用参量结构实现复杂信号环境下盲信号分离方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

MANETs网络环境下的异构移动多机器人协同控制系统任务分配机制的研究

国家自然科学基金

1+阅读 · 2012年12月31日

核桃内生真菌多样性及活性菌株发酵产物抑菌机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

集群环境下复杂结构非线性动力有限元并行求解算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高压下卤族元素材料的结构与物性研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于因果图的一致性规划研究

国家自然科学基金

2+阅读 · 2011年12月31日

超过程及相关SPDE的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员