IL-FLOw:利用正常流动从观测中学习 (IL-flOw: Imitation Learning from Observation using Normalizing Flows) - 专知论文

会员服务 ·

0

学成 · 规范化的 · 策略搜索 · MoDELS · 逆强化学习 ·

2022 年 5 月 19 日

IL-flOw: Imitation Learning from Observation using Normalizing Flows

翻译：IL-FLOw:利用正常流动从观测中学习

Wei-Di Chang,Juan Camilo Gamboa Higuera,Scott Fujimoto,David Meger,Gregory Dudek

from arxiv, Presented at the 4th Robot Learning Workshop at NeurIPS 2021

We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only. Our approach decouples reward modelling from policy learning, unlike state-of-the-art adversarial methods which require updating the reward model during policy search and are known to be unstable and difficult to optimize. Our method, IL-flOw, recovers the expert policy by modelling state-state transitions, by generating rewards using deep density estimators trained on the demonstration trajectories, avoiding the instability issues of adversarial methods. We demonstrate that using the state transition log-probability density as a reward signal for forward reinforcement learning translates to matching the trajectory distribution of the expert demonstrations, and experimentally show good recovery of the true reward signal as well as state of the art results for imitation from observation on locomotion and robotic continuous control tasks.

翻译：我们从专家国家观察中提出反强化学习算法(IRL ) 。我们的方法将奖赏建模与政策学习脱钩,不同于最先进的对抗性方法,后者要求在政策搜索期间更新奖赏模式,已知是不稳定和难以优化的。我们的方法,即IL-FlOw,通过模拟州与州之间的过渡,利用在示范轨迹上受过训练的深度密度估测员来创造奖赏,避免对抗性方法的不稳定问题,从而恢复专家政策。我们证明,利用国家过渡性逻辑概率密度作为前瞻性强化学习的奖赏信号,可以与专家演示的轨迹分布相匹配,实验性地显示,真正的奖赏信号得到了良好的恢复,并表明通过观察移动和机器人连续控制任务取得模仿的艺术结果。

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

复多项式的核拓扑熵

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

低维量子多体系统中的新奇拓扑量子数与特征量子相变的几何方法

国家自然科学基金

0+阅读 · 2013年12月31日

手性单壁碳纳米管的近红外区域光热及光动力学效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子自旋格子系统的拓扑序、量子动力学和量子quench

国家自然科学基金

0+阅读 · 2012年12月31日

基于移动Agent的无线传感器网络数据处理技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年7月8日

Combining Machine Learning and Effective Feature Selection for Real-time Stock Trading in Variable Time-frames

Arxiv

0+阅读 · 2022年7月7日

LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery

LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery

Arxiv

0+阅读 · 2022年7月7日

Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

Arxiv

0+阅读 · 2022年7月7日

Robust optimal well control using an adaptive multi-grid reinforcement learning framework

Arxiv

0+阅读 · 2022年7月7日

Infusing model predictive control into meta-reinforcement learning for mobile robots in dynamic environments

Arxiv

0+阅读 · 2022年7月7日

Self-Supervised Depth and Ego-Motion Estimation for Monocular Thermal Video Using Multi-Spectral Consistency Loss

Arxiv

0+阅读 · 2022年7月7日

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Arxiv

0+阅读 · 2022年7月6日

Weapon Engagement Zone Maximum Launch Range Estimation Using a Deep Neural Network

Arxiv

19+阅读 · 2021年11月17日

Deep Learning on Image Denoising: An overview

Arxiv

13+阅读 · 2020年8月3日

VIP会员

文章信息

相关主题

逆强化学习

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年7月8日

Combining Machine Learning and Effective Feature Selection for Real-time Stock Trading in Variable Time-frames

Arxiv

0+阅读 · 2022年7月7日

LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery

LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery

Arxiv

0+阅读 · 2022年7月7日

Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

Arxiv

0+阅读 · 2022年7月7日

Robust optimal well control using an adaptive multi-grid reinforcement learning framework

Arxiv

0+阅读 · 2022年7月7日

Infusing model predictive control into meta-reinforcement learning for mobile robots in dynamic environments

Arxiv

0+阅读 · 2022年7月7日

Self-Supervised Depth and Ego-Motion Estimation for Monocular Thermal Video Using Multi-Spectral Consistency Loss

Arxiv

0+阅读 · 2022年7月7日

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Arxiv

0+阅读 · 2022年7月6日

Weapon Engagement Zone Maximum Launch Range Estimation Using a Deep Neural Network

Arxiv

19+阅读 · 2021年11月17日

Deep Learning on Image Denoising: An overview

Arxiv

13+阅读 · 2020年8月3日

相关基金

复多项式的核拓扑熵

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

低维量子多体系统中的新奇拓扑量子数与特征量子相变的几何方法

国家自然科学基金

0+阅读 · 2013年12月31日

手性单壁碳纳米管的近红外区域光热及光动力学效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子自旋格子系统的拓扑序、量子动力学和量子quench

国家自然科学基金

0+阅读 · 2012年12月31日

基于移动Agent的无线传感器网络数据处理技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员