一致和高效的深密 Q 网络 (A Convergent and Efficient Deep Q Network Algorithm) - 专知论文

会员服务 ·

0

DQN · 学成 · Networking · 衰减系数 · Atari ·

2021 年 8 月 16 日

A Convergent and Efficient Deep Q Network Algorithm

翻译：一致和高效的深密 Q 网络

Zhikang T. Wang,Masahito Ueda

Despite the empirical success of the deep Q network (DQN) reinforcement learning algorithm and its variants, DQN is still not well understood and it does not guarantee convergence. In this work, we show that DQN can diverge and cease to operate in realistic settings. Although there exist gradient-based convergent methods, we show that they actually have inherent problems in learning behaviour and elucidate why they often fail in practice. To overcome these problems, we propose a convergent DQN algorithm (C-DQN) by carefully modifying DQN, and we show that the algorithm is convergent and can work with large discount factors (0.9998). It learns robustly in difficult settings and can learn several difficult games in the Atari 2600 benchmark where DQN fail, within a moderate computational budget. Our codes have been publicly released and can be used to reproduce our results.

翻译：尽管深Q网络的强化学习算法及其变体取得了经验性的成功,但DQN仍然没有得到很好的理解,无法保证趋同。在这项工作中,我们表明DQN可以在现实环境中出现差异,停止在现实环境中运作。虽然存在基于梯度的趋同方法,但我们表明它们实际上在学习行为方面有内在问题,并说明了它们在实践中经常失败的原因。为了克服这些问题,我们通过仔细修改DQN提出一个趋同的DQN算法(C-DQN),我们表明,该算法在困难环境中非常活跃,可以使用大折扣系数(0.9998),在Atari 2600基准中,当DQN在其中失败时,可以在适度的计算预算范围内学习一些困难的游戏。我们的代码已经公开发布,可以用来复制我们的结果。

0

相关内容

DQN

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

专知会员服务

175+阅读 · 2020年5月10日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

专知

82+阅读 · 2020年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

Arxiv

0+阅读 · 2021年10月12日

A scalable and fast artificial neural network syndrome decoder for surface codes

Arxiv

1+阅读 · 2021年10月12日

Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

Arxiv

0+阅读 · 2021年10月11日

Dynamic Sparse Training for Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年10月11日

Adaptive Temporal Difference Learning with Linear Function Approximation

Arxiv

0+阅读 · 2021年10月11日

Efficient Local Planning with Linear Function Approximation

Arxiv

0+阅读 · 2021年10月7日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

An Introduction to Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年12月3日

ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models

ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models

Arxiv

3+阅读 · 2018年9月6日

Implementing the Deep Q-Network

Arxiv

3+阅读 · 2017年11月20日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

专知会员服务

175+阅读 · 2020年5月10日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

专知

82+阅读 · 2020年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

Arxiv

0+阅读 · 2021年10月12日

A scalable and fast artificial neural network syndrome decoder for surface codes

Arxiv

1+阅读 · 2021年10月12日

Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

Arxiv

0+阅读 · 2021年10月11日

Dynamic Sparse Training for Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年10月11日

Adaptive Temporal Difference Learning with Linear Function Approximation

Arxiv

0+阅读 · 2021年10月11日

Efficient Local Planning with Linear Function Approximation

Arxiv

0+阅读 · 2021年10月7日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

An Introduction to Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年12月3日

ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models

ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models

Arxiv

3+阅读 · 2018年9月6日

Implementing the Deep Q-Network

Arxiv

3+阅读 · 2017年11月20日

微信扫码咨询专知VIP会员