ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- ARC -- ARC -- -- ARC -- ARC -- ARC -- -- ARC -- -- ARC -- -- ARC -- ARC -- ARC -- -- ARC ARC -- -- ARC -- ARC -- ARC (ARC -- Actor Residual Critic for Adversarial Imitation Learning) - 专知论文

会员服务 ·

0

评论员 · 泛函 · Learning · CASE · 近似 ·

2022 年 11 月 24 日

ARC -- Actor Residual Critic for Adversarial Imitation Learning

翻译：ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- ARC -- ARC -- -- ARC -- ARC -- ARC -- -- ARC -- -- ARC -- -- ARC -- ARC -- ARC -- -- ARC ARC -- -- ARC -- ARC -- ARC

Ankur Deka,Changliu Liu,Katia Sycara

Adversarial Imitation Learning (AIL) is a class of popular state-of-the-art Imitation Learning algorithms commonly used in robotics. In AIL, an artificial adversary's misclassification is used as a reward signal that is optimized by any standard Reinforcement Learning (RL) algorithm. Unlike most RL settings, the reward in AIL is $differentiable$ but current model-free RL algorithms do not make use of this property to train a policy. The reward is AIL is also shaped since it comes from an adversary. We leverage the differentiability property of the shaped AIL reward function and formulate a class of Actor Residual Critic (ARC) RL algorithms. ARC algorithms draw a parallel to the standard Actor-Critic (AC) algorithms in RL literature and uses a residual critic, $C$ function (instead of the standard $Q$ function) to approximate only the discounted future return (excluding the immediate reward). ARC algorithms have similar convergence properties as the standard AC algorithms with the additional advantage that the gradient through the immediate reward is exact. For the discrete (tabular) case with finite states, actions, and known dynamics, we prove that policy iteration with $C$ function converges to an optimal policy. In the continuous case with function approximation and unknown dynamics, we experimentally show that ARC aided AIL outperforms standard AIL in simulated continuous-control and real robotic manipulation tasks. ARC algorithms are simple to implement and can be incorporated into any existing AIL implementation with an AC algorithm. Video and link to code are available at: https://sites.google.com/view/actor-residual-critic.

翻译：ADIL (AIL) 是机器人通常使用的一种流行的、最高级的智能学习算法。在 AIL 中, 人为对手的错误分类被使用为一种奖励信号, 任何标准的SEAREEAR( RL) 算法都会优化。与大多数 RL 设置不同, AIL 的奖赏是美元可差别的, 但目前没有模型的 RL 算法并不使用此属性来培训政策。奖赏是 AIL 的简单自动算法。我们利用了自动智能学习学习算法的可变性属性, 并开发了一种ARC( ARC) 运算法的分类法。 ARC 算法与一个标准的ALA( ARC) 相匹配。使用标准的ALA( ARC) 匹配算法, 并使用一个常规的 ALILA( ) 算法, 运行一个直径直径直径直径比的 ALLA( ) 。

0

相关内容

评论员

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

微纳米结构Ti/Mg双连续相复合材料的可控制备及力学性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

CHOP 调控ERO1α在急性肝损伤中的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

源于红蓼（Polygonum orientale L．）植物新型先导分子的衍生合成、构效关系与抑菌作用机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

阻断TRPV4受体对脑缺血再灌注损伤的作用及其分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

SARI转录抑制机制及在急性髓细胞白血病发病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

隧道突水灾害光纤光栅多元信息表征、状态辨识与预警理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

瞬时受体电位M8（TRPM8）对前列腺癌侵袭和转移影响及其机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

ARK5/p38MAPK/Pim-3信号通路在胃癌发生、发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Single-Trajectory Distributionally Robust Reinforcement Learning

Arxiv

0+阅读 · 2023年1月27日

Efficient learning of large sets of locally optimal classification rules

Arxiv

0+阅读 · 2023年1月26日

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Arxiv

0+阅读 · 2023年1月25日

A Data-Centric Approach for Improving Adversarial Training Through the Lens of Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年1月25日

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

Arxiv

0+阅读 · 2023年1月25日

Adversarial Robustness of Representation Learning for Knowledge Graphs

Arxiv

10+阅读 · 2022年9月30日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Single-Trajectory Distributionally Robust Reinforcement Learning

Arxiv

0+阅读 · 2023年1月27日

Efficient learning of large sets of locally optimal classification rules

Arxiv

0+阅读 · 2023年1月26日

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Arxiv

0+阅读 · 2023年1月25日

A Data-Centric Approach for Improving Adversarial Training Through the Lens of Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年1月25日

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

Arxiv

0+阅读 · 2023年1月25日

Adversarial Robustness of Representation Learning for Knowledge Graphs

Arxiv

10+阅读 · 2022年9月30日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

微纳米结构Ti/Mg双连续相复合材料的可控制备及力学性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

CHOP 调控ERO1α在急性肝损伤中的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

源于红蓼（Polygonum orientale L．）植物新型先导分子的衍生合成、构效关系与抑菌作用机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

阻断TRPV4受体对脑缺血再灌注损伤的作用及其分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

SARI转录抑制机制及在急性髓细胞白血病发病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

隧道突水灾害光纤光栅多元信息表征、状态辨识与预警理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

瞬时受体电位M8（TRPM8）对前列腺癌侵袭和转移影响及其机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

ARK5/p38MAPK/Pim-3信号通路在胃癌发生、发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员