ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- ARC -- ARC -- -- ARC -- ARC -- ARC -- -- ARC -- -- ARC -- -- ARC -- ARC -- ARC -- -- ARC ARC -- -- ARC -- ARC -- ARC (ARC -- Actor Residual Critic for Adversarial Imitation Learning) - 专知论文

会员服务 ·

0

Learning · 评论员 · 泛函 · 优化器 · CASE ·

2022 年 6 月 5 日

ARC -- Actor Residual Critic for Adversarial Imitation Learning

翻译：ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- -- ARC -- ARC -- ARC -- -- ARC -- ARC -- ARC -- -- ARC -- -- ARC -- -- ARC -- ARC -- ARC -- -- ARC ARC -- -- ARC -- ARC -- ARC

Ankur Deka,Changliu Liu,Katia Sycara

Adversarial Imitation Learning (AIL) is a class of popular state-of-the-art Imitation Learning algorithms where an artificial adversary's misclassification is used as a reward signal and is optimized by any standard Reinforcement Learning (RL) algorithm. Unlike most RL settings, the reward in AIL is differentiable but model-free RL algorithms do not make use of this property to train a policy. In contrast, we leverage the differentiability property of the AIL reward function and formulate a class of Actor Residual Critic (ARC) RL algorithms that draw a parallel to the standard Actor-Critic (AC) algorithms in RL literature and uses a residual critic, C function (instead of the standard Q function) to approximate only the discounted future return (excluding the immediate reward). ARC algorithms have similar convergence properties as the standard AC algorithms with the additional advantage that the gradient through the immediate reward is exact. For the discrete (tabular) case with finite states, actions, and known dynamics, we prove that policy iteration with $C$ function converges to an optimal policy. In the continuous case with function approximation and unknown dynamics, we experimentally show that ARC aided AIL outperforms standard AIL in simulated continuous-control and real robotic manipulation tasks. ARC algorithms are simple to implement and can be incorporated into any existing AIL implementation with an AC algorithm.

翻译：AIL 是一种受欢迎的最先进的模拟学习算法,其中人为对手错误分类用作奖赏信号,并被任何标准的加强学习算法优化。与大多数 RL 设置不同, AIL 的奖赏是不同的,但没有模型的RL 算法没有利用该属性来培训政策。相比之下,我们利用AIL 奖赏功能的可变性,并制定了一种ARC 算法,与RL 文献中标准的Actor-Critical算法平行,并使用残余评论家C 函数(而不是标准Q函数)来优化。 ARC 算法具有类似于标准 AC 算法的趋同性,其额外优势是,通过直接奖赏的梯度可以准确。对于与固定状态、动作和已知动态的ARC 分解(ARC) 和已知的A 运算算算法,我们证明政策与A AL AS 最不相适应的A 和ARC 的A AS 校正校程功能, 与A AS AS 校准的A 校正校正校正校正校正校正校正校正校正显示校正校正校正校正校正校正校正校正校正校正校正校正校正

0

相关内容

Learning

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

专知会员服务

27+阅读 · 2020年4月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于深度学习的高分辨率PolSAR影像暗目标判别

国家自然科学基金

3+阅读 · 2015年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

南极天文望远镜控制系统的自主实时性能评估方法的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

拟南芥与油菜种子油脂积累消减器(SFAR)对含油量形成的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Haccpper环境中不锈钢表面活性与电化学噪声特征研究

国家自然科学基金

0+阅读 · 2012年12月31日

高效率反型叠层有机太阳能电池的制备及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用蛋白质组学策略分析星星草根盐胁迫应答氧化还原敏感蛋白质

国家自然科学基金

0+阅读 · 2012年12月31日

环境诱导家蚕滞育的CREB调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

Optimal precision for GANs

Optimal precision for GANs

Arxiv

0+阅读 · 2022年7月21日

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

Arxiv

0+阅读 · 2022年7月21日

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年7月19日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Generative Adversarial Networks: A Survey and Taxonomy

Generative Adversarial Networks: A Survey and Taxonomy

Arxiv

14+阅读 · 2019年6月4日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

专知会员服务

27+阅读 · 2020年4月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

面向向量的机器学习系统：跨栈方法

【ICCV2025】SO(3) 上连续非保守动力系统的预测

【ICCV2025】Lay2Story：扩展扩散 Transformer 以实现可切换布局的故事生成

人工智能治理全景综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Optimal precision for GANs

Optimal precision for GANs

Arxiv

0+阅读 · 2022年7月21日

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

Arxiv

0+阅读 · 2022年7月21日

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年7月19日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Generative Adversarial Networks: A Survey and Taxonomy

Generative Adversarial Networks: A Survey and Taxonomy

Arxiv

14+阅读 · 2019年6月4日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

相关基金

基于深度学习的高分辨率PolSAR影像暗目标判别

国家自然科学基金

3+阅读 · 2015年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

南极天文望远镜控制系统的自主实时性能评估方法的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

拟南芥与油菜种子油脂积累消减器(SFAR)对含油量形成的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Haccpper环境中不锈钢表面活性与电化学噪声特征研究

国家自然科学基金

0+阅读 · 2012年12月31日

高效率反型叠层有机太阳能电池的制备及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用蛋白质组学策略分析星星草根盐胁迫应答氧化还原敏感蛋白质

国家自然科学基金

0+阅读 · 2012年12月31日

环境诱导家蚕滞育的CREB调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员