用于离线 RL 的 Q 集合: 不要缩放组合, 缩放批量大小 (Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size) - 专知论文

会员服务 ·

0

缩放 · Learning · Batch Size · 学习率 · 可约的 ·

2022 年 11 月 20 日

Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

翻译：用于离线 RL 的 Q 集合: 不要缩放组合, 缩放批量大小

Alexander Nikulin,Vladislav Kurenkov,Denis Tarasov,Dmitry Akimov,Sergey Kolesnikov

from arxiv, Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2022

Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for model-free deep offline RL algorithms, recently introduced Q-ensemble methods achieving state-of-the-art performance made this issue more relevant, notably extending the training duration. In this work, we demonstrate how this class of methods can benefit from large-batch optimization, which is commonly overlooked by the deep offline RL community. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time, effectively shortening training duration by 3-4x times on average.

翻译：众所周知,大型神经网络培训耗费时间,学习时间需要数天甚至数周。为了解决这一问题,采用了大批量优化办法。这种方法表明,通过适当调整学习率而扩大小型批量规模,可以按数量级加快培训进程。虽然长期培训时间通常不是没有模型的深离线算法的主要问题,但最近引入的实现最先进业绩的组合方法使得这一问题更加相关,特别是延长了培训时间。在这项工作中,我们展示了这一类方法如何能够受益于大批量优化,而深离线的RL社区通常忽视了这种优化。我们表明,扩大小型批量规模和天真地调整学习率可以(1) 降低批量规模,(2) 加大分配外行动的处罚力度,(3) 改进整合时间,有效地平均将培训时间缩短3-4x倍。

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

具有奇异退化异宿轨的四维Lorenz型超混沌系统研究

国家自然科学基金

0+阅读 · 2015年12月31日

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

高血糖激活滑膜AGE-RAGE-PKC轴致骨关节炎易感的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

量子扰动对自旋轨道耦合的玻色爱因斯坦凝聚动力学性质影响的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于WorldView-3和OP-ELM的矿化蚀变提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

功率变换器非线性不稳定行为的washout滤波器控制方法

国家自然科学基金

0+阅读 · 2012年12月31日

传输极限皮秒脉冲光纤放大的理论和实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

海洋吡咯生物碱的设计、合成与活性研究

国家自然科学基金

0+阅读 · 2011年12月31日

气化炉控制系统设计的随机方法

国家自然科学基金

0+阅读 · 2011年12月31日

蛋白质传输过程中构象及其动力学性质的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Further Exploration of the Effects of Time-varying Covariate in Growth Mixture Models with Nonlinear Trajectories

Arxiv

0+阅读 · 2023年1月23日

A Deep Learning Approach for SAR Tomographic Imaging of Forested Areas

A Deep Learning Approach for SAR Tomographic Imaging of Forested Areas

Arxiv

0+阅读 · 2023年1月20日

Towards continually learning new languages

Arxiv

0+阅读 · 2023年1月20日

PyOED: An Extensible Suite for Data Assimilation and Model-Constrained Optimal Design of Experiments

Arxiv

0+阅读 · 2023年1月19日

Reslicing Ultrasound Images for Data Augmentation and Vessel Reconstruction

Arxiv

0+阅读 · 2023年1月18日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Large Margin Few-Shot Learning

Arxiv

11+阅读 · 2018年7月8日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Further Exploration of the Effects of Time-varying Covariate in Growth Mixture Models with Nonlinear Trajectories

Arxiv

0+阅读 · 2023年1月23日

A Deep Learning Approach for SAR Tomographic Imaging of Forested Areas

A Deep Learning Approach for SAR Tomographic Imaging of Forested Areas

Arxiv

0+阅读 · 2023年1月20日

Towards continually learning new languages

Arxiv

0+阅读 · 2023年1月20日

PyOED: An Extensible Suite for Data Assimilation and Model-Constrained Optimal Design of Experiments

Arxiv

0+阅读 · 2023年1月19日

Reslicing Ultrasound Images for Data Augmentation and Vessel Reconstruction

Arxiv

0+阅读 · 2023年1月18日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Large Margin Few-Shot Learning

Arxiv

11+阅读 · 2018年7月8日

相关基金

具有奇异退化异宿轨的四维Lorenz型超混沌系统研究

国家自然科学基金

0+阅读 · 2015年12月31日

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

高血糖激活滑膜AGE-RAGE-PKC轴致骨关节炎易感的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

量子扰动对自旋轨道耦合的玻色爱因斯坦凝聚动力学性质影响的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于WorldView-3和OP-ELM的矿化蚀变提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

功率变换器非线性不稳定行为的washout滤波器控制方法

国家自然科学基金

0+阅读 · 2012年12月31日

传输极限皮秒脉冲光纤放大的理论和实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

海洋吡咯生物碱的设计、合成与活性研究

国家自然科学基金

0+阅读 · 2011年12月31日

气化炉控制系统设计的随机方法

国家自然科学基金

0+阅读 · 2011年12月31日

蛋白质传输过程中构象及其动力学性质的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员