观测的跨部吸收 (Cross-domain Imitation from Observations) - 专知论文

会员服务 ·

0

学成 · ONCE · 泛函 · 估计/估计量 · 奖励函数 ·

2021 年 5 月 20 日

Cross-domain Imitation from Observations

翻译：观测的跨部吸收

Dripta S. Raychaudhuri,Sujoy Paul,Jeroen van Baar,Amit K. Roy-Chowdhury

from arxiv, Accepted at ICML 2021 as a long presentation

Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior. With environments modeled as Markov Decision Processes (MDP), most of the existing imitation algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitation policy is to be learned. In this paper, we study the problem of how to imitate tasks when there exist discrepancies between the expert and agent MDP. These discrepancies across domains could include differing dynamics, viewpoint, or morphology; we present a novel framework to learn correspondences across such domains. Importantly, in contrast to prior works, we use unpaired and unaligned trajectories containing only states in the expert domain, to learn this correspondence. We utilize a cycle-consistency constraint on both the state space and a domain agnostic latent space to do this. In addition, we enforce consistency on the temporal position of states via a normalized position estimator function, to align the trajectories across the two domains. Once this correspondence is found, we can directly transfer the demonstrations on one domain to the other and use it for imitation. Experiments across a wide variety of challenging domains demonstrate the efficacy of our approach.

翻译：光学学习试图通过利用专家行为来避免设计培训人员的适当奖赏功能的困难。以Markov 决策程序( MDP)为模型的环境, 现有的多数模仿算法都取决于在新的仿照政策中, 与学习新仿照政策时一样, 在同一 MDP 中, 是否有专家演示。在本文中, 我们研究当专家和MDP 代理之间存在差异时如何模仿任务的问题。这些领域之间的差异可能包括不同的动态、观点或形态; 我们提供了一个新颖的框架来学习这类领域的通信。重要的是, 与以前的工作不同, 我们使用仅包含专家领域的国家的不匹配和不匹配的轨迹来学习这一函文。我们使用对州空间和一个域的周期一致性限制来进行这种学习。此外, 我们通过一个归正的位置测量功能来强制调整各州的时间位置, 以调和两个领域的轨迹。一旦发现该函文, 我们就可以直接将一个域的演示标本直接转移到另一个域, 挑战其它域的实验功能。

8

相关内容

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

【Manning2020新书】Elm 实战，344页pdf，Elm in Action

【Manning2020新书】Elm 实战，344页pdf，Elm in Action

专知会员服务

51+阅读 · 2020年4月14日

【Google-WWW2020】会话域探索的动态组合， Conversational Domain Exploration

专知会员服务

10+阅读 · 2020年3月22日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

近期必读的9篇CVPR 2019【域自适应（Domain Adaptation）】相关论文和代码

近期必读的9篇CVPR 2019【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

62+阅读 · 2020年1月10日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【ECML-PKDD 2019】基于挖掘的航迹模式的在线长期航迹预测（Online long-term trajectory prediction based on mined route patterns）， Panagiotis Tampakis，Harris Georgiou

【ECML-PKDD 2019】基于挖掘的航迹模式的在线长期航迹预测（Online long-term trajectory prediction based on mined route patterns）， Panagiotis Tampakis，Harris Georgiou

专知会员服务

34+阅读 · 2019年9月16日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Induced Domain Adaptation

Arxiv

0+阅读 · 2021年7月13日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

Arxiv

4+阅读 · 2020年3月26日

One-Shot Unsupervised Cross Domain Translation

Arxiv

5+阅读 · 2018年10月23日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

Cross-Domain Adversarial Auto-Encoder

Arxiv

4+阅读 · 2018年4月17日

VR Goggles for Robots: Real-to-sim Domain Adaptation for Visual Control

Arxiv

5+阅读 · 2018年2月1日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

【Manning2020新书】Elm 实战，344页pdf，Elm in Action

【Manning2020新书】Elm 实战，344页pdf，Elm in Action

专知会员服务

51+阅读 · 2020年4月14日

【Google-WWW2020】会话域探索的动态组合， Conversational Domain Exploration

专知会员服务

10+阅读 · 2020年3月22日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

近期必读的9篇CVPR 2019【域自适应（Domain Adaptation）】相关论文和代码

近期必读的9篇CVPR 2019【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

62+阅读 · 2020年1月10日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【ECML-PKDD 2019】基于挖掘的航迹模式的在线长期航迹预测（Online long-term trajectory prediction based on mined route patterns）， Panagiotis Tampakis，Harris Georgiou

【ECML-PKDD 2019】基于挖掘的航迹模式的在线长期航迹预测（Online long-term trajectory prediction based on mined route patterns）， Panagiotis Tampakis，Harris Georgiou

专知会员服务

34+阅读 · 2019年9月16日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Induced Domain Adaptation

Arxiv

0+阅读 · 2021年7月13日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

Arxiv

4+阅读 · 2020年3月26日

One-Shot Unsupervised Cross Domain Translation

Arxiv

5+阅读 · 2018年10月23日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

Cross-Domain Adversarial Auto-Encoder

Arxiv

4+阅读 · 2018年4月17日

VR Goggles for Robots: Real-to-sim Domain Adaptation for Visual Control

Arxiv

5+阅读 · 2018年2月1日

微信扫码咨询专知VIP会员