解释在线模拟学习快速改进的原因 (Explaining Fast Improvement in Online Imitation Learning) - 专知论文

会员服务 ·

0

策略改进 · 损失函数（机器学习） · 学成 · Performer · 在线 ·

2021 年 2 月 22 日

Explaining Fast Improvement in Online Imitation Learning

翻译：解释在线模拟学习快速改进的原因

Xinyan Yan,Byron Boots,Ching-An Cheng

from arxiv, 22 pages, 2 figures

Online imitation learning (IL) is an algorithmic framework that leverages interactions with expert policies for efficient policy optimization. Here policies are optimized by performing online learning on a sequence of loss functions that encourage the learner to mimic expert actions, and if the online learning has no regret, the agent can provably learn an expert-like policy. Online IL has demonstrated empirical successes in many applications and interestingly, its policy improvement speed observed in practice is usually much faster than existing theory suggests. In this work, we provide an explanation of this phenomenon. Let $\xi$ denote the policy class bias and assume the online IL loss functions are convex, smooth, and non-negative. We prove that, after $N$ rounds of online IL with stochastic feedback, the policy improves in $\tilde{O}(1/N + \sqrt{\xi/N})$ in both expectation and high probability. In other words, we show that adopting a sufficiently expressive policy class in online IL has two benefits: both the policy improvement speed increases and the performance bias decreases.

翻译：在线模拟学习( IL) 是一个逻辑框架,它能利用与专家政策的互动来提高效率政策优化。这里的政策通过在鼓励学习者模仿专家行动的损失函数序列上进行在线学习而得到优化, 如果在线学习没有遗憾, 代理商可以顺利地学习专家类政策。在线IL在许多应用中展示了经验成功, 令人感兴趣的是, 在实践中观察到的政策改进速度通常比现有理论所显示的要快得多。在这项工作中, 我们给出了对这一现象的解释。让我们用$xx$来表示政策阶级的偏向, 并假设在线 IL损失函数是顺畅的和非负的。我们证明, 在用随机反馈进行一回合的在线 IL 后, 政策在预期值和高概率两方面都得到了改善 $tilde{O} (1/ N +\ qrtxxxi/N} 。。换句话说, 我们表明, 在在线 IL 中采用一个足够清晰的政策分类有两种好处: 政策改进速度和业绩偏差的下降。

1

相关内容

策略改进

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

专知会员服务

66+阅读 · 2020年8月22日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

17种深度强化学习算法用Pytorch实现

17种深度强化学习算法用Pytorch实现

新智元

31+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

深度学习与NLP

15+阅读 · 2018年6月20日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Tightening Exploration in Upper Confidence Reinforcement Learning

Arxiv

0+阅读 · 2021年4月12日

A Theoretical Analysis of Learning with Noisily Labeled Data

Arxiv

0+阅读 · 2021年4月8日

Approximate Inverse Reinforcement Learning from Vision-based Imitation Learning

Arxiv

0+阅读 · 2021年4月8日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Online Deep Metric Learning

Arxiv

8+阅读 · 2018年5月15日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

SpectralLeader: Online Spectral Learning for Single Topic Models

Arxiv

4+阅读 · 2018年2月16日

VIP会员

文章信息

相关主题

损失函数（机器学习）

相关VIP内容

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

专知会员服务

66+阅读 · 2020年8月22日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

17种深度强化学习算法用Pytorch实现

17种深度强化学习算法用Pytorch实现

新智元

31+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

深度学习与NLP

15+阅读 · 2018年6月20日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Tightening Exploration in Upper Confidence Reinforcement Learning

Arxiv

0+阅读 · 2021年4月12日

A Theoretical Analysis of Learning with Noisily Labeled Data

Arxiv

0+阅读 · 2021年4月8日

Approximate Inverse Reinforcement Learning from Vision-based Imitation Learning

Arxiv

0+阅读 · 2021年4月8日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Online Deep Metric Learning

Arxiv

8+阅读 · 2018年5月15日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

SpectralLeader: Online Spectral Learning for Single Topic Models

Arxiv

4+阅读 · 2018年2月16日

微信扫码咨询专知VIP会员