通过重放估计来学习最小最大优化在线模拟 (Minimax Optimal Online Imitation Learning via Replay Estimation) - 专知论文

会员服务 ·

0

Learning · 估计/估计量 · Performer · Minimax · 矩匹配 ·

2022 年 6 月 2 日

Minimax Optimal Online Imitation Learning via Replay Estimation

翻译：通过重放估计来学习最小最大优化在线模拟

Gokul Swamy,Nived Rajaraman,Matthew Peng,Sanjiban Choudhury,J. Andrew Bagnell,Zhiwei Steven Wu,Jiantao Jiao,Kannan Ramchandran

Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap that scales with $H^2 / N$ for behavioral cloning and $H / \sqrt{N}$ for online moment matching, where $H$ is the horizon and $N$ is the size of the expert dataset. We introduce the technique of replay estimation to reduce this empirical variance: by repeatedly executing cached expert actions in a stochastic simulator, we compute a smoother expert visitation distribution estimate to match. In the presence of general function approximation, we prove a meta theorem reducing the performance gap of our approach to the parameter estimation error for offline classification (i.e. learning the expert policy). In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by our approach achieves the optimal $\widetilde{O} \left( \min({H^{3/2}} / {N}, {H} / {\sqrt{N}} \right)$ dependency, under significantly weaker assumptions compared to prior work. We implement multiple instantiations of our approach on several continuous control tasks and find that we are able to significantly improve policy performance across a variety of dataset sizes.

翻译：在线模拟学习是模拟专家演示的最佳方法问题,因为有环境或准确模拟器。先前的工作已经表明,在无限的抽样制度中,精确的匹配时间可以实现与专家政策的等值。然而,在有限的抽样制度中,即使没有优化错误,经验差异也会导致绩效差距,在行为性克隆方面,以$H2/N$/N$为尺度,在线性分类方面,以$H是地平线,美元是美元。在专家数据集的大小方面,我们引入了重放估算技术,以减少这种经验性差异:通过在随机模拟器中反复执行缓存的专家行动,我们算出一个更顺畅的专家访问分布估计来匹配。在一般功能接近的情况下,我们证明我们用于离线性分类的参数估计错误(即学习专家政策)的性差是元差。在表格设置或直线性函数的近似值方面,我们元值显示,我们的方法在前期的性差中大大改进了业绩,在前期的值上,在最大范围内,在前期的性差值上,{H2}

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

大扰动下不可压缩Navier-Stokes方程的稳定性态

国家自然科学基金

0+阅读 · 2015年12月31日

可压缩Navier-Stokes方程组及相关模型解的整体适定性研究

国家自然科学基金

0+阅读 · 2015年12月31日

不可压缩Navier-Stokes方程解的性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

几类扩散过程的逼近及应用

国家自然科学基金

1+阅读 · 2014年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

适应性建筑表皮的多目标优化模型

国家自然科学基金

0+阅读 · 2013年12月31日

非线性分段连续型微分系统数值方法的分支相容性研究

国家自然科学基金

0+阅读 · 2012年12月31日

可压缩Navier-Stokes方程全局光滑解的适定性问题

国家自然科学基金

0+阅读 · 2012年12月31日

大规模风电并网的运行风险预警与协调防御

国家自然科学基金

0+阅读 · 2011年12月31日

在生产成本是凸函数下的最优库存控制

国家自然科学基金

0+阅读 · 2011年12月31日

Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations

Arxiv

0+阅读 · 2022年7月20日

AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation

Arxiv

0+阅读 · 2022年7月20日

Regret Minimization with Noisy Observations

Arxiv

0+阅读 · 2022年7月19日

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Arxiv

0+阅读 · 2022年7月19日

Online Learning with Off-Policy Feedback

Arxiv

0+阅读 · 2022年7月18日

Finite-Sample Maximum Likelihood Estimation of Location

Arxiv

0+阅读 · 2022年7月18日

CausNet : Generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints

Arxiv

0+阅读 · 2022年7月18日

Optimal No-regret Learning in Repeated First-price Auctions

Arxiv

0+阅读 · 2022年7月15日

Making Linear MDPs Practical via Contrastive Representation Learning

Arxiv

0+阅读 · 2022年7月14日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations

Arxiv

0+阅读 · 2022年7月20日

AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation

Arxiv

0+阅读 · 2022年7月20日

Regret Minimization with Noisy Observations

Arxiv

0+阅读 · 2022年7月19日

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Arxiv

0+阅读 · 2022年7月19日

Online Learning with Off-Policy Feedback

Arxiv

0+阅读 · 2022年7月18日

Finite-Sample Maximum Likelihood Estimation of Location

Arxiv

0+阅读 · 2022年7月18日

CausNet : Generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints

Arxiv

0+阅读 · 2022年7月18日

Optimal No-regret Learning in Repeated First-price Auctions

Arxiv

0+阅读 · 2022年7月15日

Making Linear MDPs Practical via Contrastive Representation Learning

Arxiv

0+阅读 · 2022年7月14日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

相关基金

大扰动下不可压缩Navier-Stokes方程的稳定性态

国家自然科学基金

0+阅读 · 2015年12月31日

可压缩Navier-Stokes方程组及相关模型解的整体适定性研究

国家自然科学基金

0+阅读 · 2015年12月31日

不可压缩Navier-Stokes方程解的性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

几类扩散过程的逼近及应用

国家自然科学基金

1+阅读 · 2014年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

适应性建筑表皮的多目标优化模型

国家自然科学基金

0+阅读 · 2013年12月31日

非线性分段连续型微分系统数值方法的分支相容性研究

国家自然科学基金

0+阅读 · 2012年12月31日

可压缩Navier-Stokes方程全局光滑解的适定性问题

国家自然科学基金

0+阅读 · 2012年12月31日

大规模风电并网的运行风险预警与协调防御

国家自然科学基金

0+阅读 · 2011年12月31日

在生产成本是凸函数下的最优库存控制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员