离线模仿学习的最佳传输方法 (Optimal Transport for Offline Imitation Learning) - 专知论文

会员服务 ·

0

离线强化学习 · 传输 · 传输方法 · 演示 · 强化学习 ·

2023 年 3 月 24 日

Optimal Transport for Offline Imitation Learning

翻译：离线模仿学习的最佳传输方法

Yicheng Luo,Zhengyao Jiang,Samuel Cohen,Edward Grefenstette,Marc Peter Deisenroth

from arxiv, Published in ICLR 2023

With the advent of large datasets, offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive. In this paper, we introduce Optimal Transport Reward labeling (OTR), an algorithm that assigns rewards to offline trajectories, with a few high-quality demonstrations. OTR's key idea is to use optimal transport to compute an optimal alignment between an unlabeled trajectory in the dataset and an expert demonstration to obtain a similarity measure that can be interpreted as a reward, which can then be used by an offline RL algorithm to learn the policy. OTR is easy to implement and computationally efficient. On D4RL benchmarks, we show that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.

翻译：随着大数据集的出现，离线强化学习框架是一种非常有前途的学习好的决策政策的方法，无需与实际环境进行交互。但是，离线强化学习需要对数据集进行奖励注释，当奖励工程困难或获得奖励注释耗费大量人力时，会带来实际难题。在本文中，我们引入了最佳传输奖励标记（OTR）算法，该算法利用一些高质量的演示将奖励赋予离线轨迹。OTR的关键思想是使用最佳传输计算无标签数据集中的轨迹与专家演示之间的最优对齐，以得到可解释为奖励的相似性度量，然后离线强化学习算法可以利用该度量学习决策政策。OTR易于实现且计算效率高。在D4RL基准测试中，我们展示了使用单一演示的OTR可以持续地匹配使用基于真实奖励的离线强化学习的性能。

0

相关内容

离线强化学习

离线强化学习

【ICML2021】预测观察进行模仿学习

专知会员服务

24+阅读 · 2021年7月10日

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

专知会员服务

66+阅读 · 2020年8月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于重要性采样的并行离策略强化学习方法研究

国家自然科学基金

23+阅读 · 2015年12月31日

资源受限环境下实时超分辨重建方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于不确定规划理论的无线网络视频流传输研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于智能在线虚拟参考反馈整定的控制方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于机器学习的混合动力电动汽车在线智能控制研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于一致性数据融合的磨机负荷自适应软测量研究

国家自然科学基金

0+阅读 · 2013年12月31日

多元函数的稀疏逼近与随机逼近

国家自然科学基金

1+阅读 · 2012年12月31日

淹没环境下前混合磨料水射流对低透气性煤层的冲蚀机理及增透效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模机器学习问题的结构优化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Style Transfer Enabled Sim2Real Framework for Efficient Learning of Robotic Ultrasound Image Analysis Using Simulated Data

Arxiv

0+阅读 · 2023年5月16日

Meta-optimized Contrastive Learning for Sequential Recommendation

Arxiv

0+阅读 · 2023年5月16日

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

Arxiv

0+阅读 · 2023年5月15日

PPO-ABR: Proximal Policy Optimization based Deep Reinforcement Learning for Adaptive BitRate streaming

Arxiv

0+阅读 · 2023年5月14日

Selective imitation on the basis of reward function similarity

Arxiv

0+阅读 · 2023年5月12日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

离线强化学习

相关VIP内容

【ICML2021】预测观察进行模仿学习

专知会员服务

24+阅读 · 2021年7月10日

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

专知会员服务

66+阅读 · 2020年8月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Style Transfer Enabled Sim2Real Framework for Efficient Learning of Robotic Ultrasound Image Analysis Using Simulated Data

Arxiv

0+阅读 · 2023年5月16日

Meta-optimized Contrastive Learning for Sequential Recommendation

Arxiv

0+阅读 · 2023年5月16日

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

Arxiv

0+阅读 · 2023年5月15日

PPO-ABR: Proximal Policy Optimization based Deep Reinforcement Learning for Adaptive BitRate streaming

Arxiv

0+阅读 · 2023年5月14日

Selective imitation on the basis of reward function similarity

Arxiv

0+阅读 · 2023年5月12日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

基于重要性采样的并行离策略强化学习方法研究

国家自然科学基金

23+阅读 · 2015年12月31日

资源受限环境下实时超分辨重建方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于不确定规划理论的无线网络视频流传输研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于智能在线虚拟参考反馈整定的控制方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于机器学习的混合动力电动汽车在线智能控制研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于一致性数据融合的磨机负荷自适应软测量研究

国家自然科学基金

0+阅读 · 2013年12月31日

多元函数的稀疏逼近与随机逼近

国家自然科学基金

1+阅读 · 2012年12月31日

淹没环境下前混合磨料水射流对低透气性煤层的冲蚀机理及增透效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模机器学习问题的结构优化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员