通过镜光源反向强化学习进行强力消化 (Robust Imitation via Mirror Descent Inverse Reinforcement Learning) - 专知论文

会员服务 ·

0

逆强化学习 · Learning · 稳健性 · 估计/估计量 · 优化器 ·

2022 年 10 月 20 日

Robust Imitation via Mirror Descent Inverse Reinforcement Learning

翻译：通过镜光源反向强化学习进行强力消化

Dong-Sig Han,Hyunseo Kim,Hyundo Lee,Je-Hwan Ryu,Byoung-Tak Zhang

Recently, adversarial imitation learning has shown a scalable reward acquisition method for inverse reinforcement learning (IRL) problems. However, estimated reward signals often become uncertain and fail to train a reliable statistical model since the existing methods tend to solve hard optimization problems directly. Inspired by a first-order optimization method called mirror descent, this paper proposes to predict a sequence of reward functions, which are iterative solutions for a constrained convex problem. IRL solutions derived by mirror descent are tolerant to the uncertainty incurred by target density estimation since the amount of reward learning is regulated with respect to local geometric constraints. We prove that the proposed mirror descent update rule ensures robust minimization of a Bregman divergence in terms of a rigorous regret bound of $\mathcal{O}(1/T)$ for step sizes $\{\eta_t\}_{t=1}^{T}$. Our IRL method was applied on top of an adversarial framework, and it outperformed existing adversarial methods in an extensive suite of benchmarks.

翻译：最近,对抗性模拟学习展示了反强化学习(IRL)问题的一种可扩展的奖励获取方法。然而,估计的奖励信号往往变得不确定,并且没有训练可靠的统计模式,因为现有方法往往直接解决硬优化问题。在被称为镜底的一阶优化方法的启发下,本文件建议预测一系列奖励功能,这是制约锥形问题的迭代解决办法。镜底法对目标密度估计所产生的不确定性持容忍态度,因为奖励学习的数量受当地几何限制的制约。我们证明,拟议的镜底更新规则确保以$\mathcal{O}(1/T)的严格遗憾约束,最大限度地减少布雷格曼在步数上的差异。我们的IRL方法在对抗性框架的顶部应用,在广泛的一套基准中超过了现有的对抗方法。

0

相关内容

逆强化学习

逆强化学习

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

TRIB3基因表达对糖尿病大血管致纤维病变的作用及中药桃仁干预机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

HER4通过调控自噬保护骨肉瘤细胞逃避凋亡的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制胸腺脂肪细胞生成的分子调控网络研究

国家自然科学基金

0+阅读 · 2012年12月31日

姿态气动耦合的高超声速飞行器分块建模及鲁棒控制

国家自然科学基金

0+阅读 · 2012年12月31日

Kupffer细胞上GITRL在大鼠肝移植免疫耐受重建中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

炎症通过mTOR信号通路导致脂肪组织储脂能力下降

国家自然科学基金

0+阅读 · 2011年12月31日

高密度封装面阵列无铅焊料微互连在热-跌落顺序载荷下的失效机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

prfA基因突变对单核细胞增生性李氏杆菌毒力及免疫原性的影响

国家自然科学基金

0+阅读 · 2009年12月31日

CD226分子抗小鼠胸腺细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning

Arxiv

0+阅读 · 2022年12月2日

Navigating to Objects in the Real World

Arxiv

0+阅读 · 2022年12月2日

TTRISK: Tensor Train Decomposition Algorithm for Risk Averse Optimization

Arxiv

0+阅读 · 2022年12月1日

Safe Reinforcement Learning with Probabilistic Control Barrier Functions for Ramp Merging

Arxiv

0+阅读 · 2022年12月1日

A Reinforcement Learning Approach to Optimize Available Network Bandwidth Utilization

Arxiv

1+阅读 · 2022年12月1日

ARC -- Actor Residual Critic for Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年11月30日

Surrogate "Level-Based" Lagrangian Relaxation for Mixed-Integer Linear Programming

Arxiv

0+阅读 · 2022年11月30日

Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach

Arxiv

0+阅读 · 2022年11月29日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

逆强化学习

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能技术提升军事不确定性环境下领导决策能力研究》180页

以机器速度锁定目标：人工智能的能力与局限

中文版 | 革新国家安全：国防情报离线本地部署大语言模型

《美军21世纪医疗抵消战略》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning

Arxiv

0+阅读 · 2022年12月2日

Navigating to Objects in the Real World

Arxiv

0+阅读 · 2022年12月2日

TTRISK: Tensor Train Decomposition Algorithm for Risk Averse Optimization

Arxiv

0+阅读 · 2022年12月1日

Safe Reinforcement Learning with Probabilistic Control Barrier Functions for Ramp Merging

Arxiv

0+阅读 · 2022年12月1日

A Reinforcement Learning Approach to Optimize Available Network Bandwidth Utilization

Arxiv

1+阅读 · 2022年12月1日

ARC -- Actor Residual Critic for Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年11月30日

Surrogate "Level-Based" Lagrangian Relaxation for Mixed-Integer Linear Programming

Arxiv

0+阅读 · 2022年11月30日

Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach

Arxiv

0+阅读 · 2022年11月29日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

TRIB3基因表达对糖尿病大血管致纤维病变的作用及中药桃仁干预机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

HER4通过调控自噬保护骨肉瘤细胞逃避凋亡的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制胸腺脂肪细胞生成的分子调控网络研究

国家自然科学基金

0+阅读 · 2012年12月31日

姿态气动耦合的高超声速飞行器分块建模及鲁棒控制

国家自然科学基金

0+阅读 · 2012年12月31日

Kupffer细胞上GITRL在大鼠肝移植免疫耐受重建中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

炎症通过mTOR信号通路导致脂肪组织储脂能力下降

国家自然科学基金

0+阅读 · 2011年12月31日

高密度封装面阵列无铅焊料微互连在热-跌落顺序载荷下的失效机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

prfA基因突变对单核细胞增生性李氏杆菌毒力及免疫原性的影响

国家自然科学基金

0+阅读 · 2009年12月31日

CD226分子抗小鼠胸腺细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员