RLx2:培训来自Scratch的粗件深强化学习模式</s> (RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch) - 专知论文

会员服务 ·

0

稀疏 · MoDELS · Learning · Performer · 深度强化学习 ·

2023 年 3 月 8 日

RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch

翻译：RLx2:培训来自Scratch的粗件深强化学习模式

Yiqin Tan,Pihe Hu,Ling Pan,Jiatai Huang,Longbo Huang

from arxiv, ICLR 2023 spotlight

Training deep reinforcement learning (DRL) models usually requires high computation costs. Therefore, compressing DRL models possesses immense potential for training acceleration and model deployment. However, existing methods that generate small models mainly adopt the knowledge distillation-based approach by iteratively training a dense network. As a result, the training process still demands massive computing resources. Indeed, sparse training from scratch in DRL has not been well explored and is particularly challenging due to non-stationarity in bootstrap training. In this work, we propose a novel sparse DRL training framework, "the Rigged Reinforcement Learning Lottery" (RLx2), which builds upon gradient-based topology evolution and is capable of training a sparse DRL model based entirely on a sparse network. Specifically, RLx2 introduces a novel multi-step TD target mechanism with a dynamic-capacity replay buffer to achieve robust value learning and efficient topology exploration in sparse models. It also reaches state-of-the-art sparse training performance in several tasks, showing 7.5\times-20\times model compression with less than 3% performance degradation and up to 20\times and 50\times FLOPs reduction for training and inference, respectively.

翻译：深度强化培训学习模式通常要求很高的计算成本。因此,压缩DRL模型具有巨大的培训加速和模型部署潜力。但是,现有的生成小模型的方法主要采用知识蒸馏法,反复培训一个密集的网络。因此,培训进程仍然需要大量的计算资源。事实上,DRL从零到零的训练没有很好地探索,而且由于靴带训练中的非静态性,因此特别具有挑战性。在这项工作中,我们提出了一个新的稀疏的DRL培训框架,即“强化强化学习彩票”(RLx2),它以基于梯度的地形演变为基础,能够完全在稀少的网络上培训稀薄的DRL模型。具体地说,RLx2引入了一个新的多步骤TD目标机制,具有动态能力回放缓冲,以便在稀疏模式中实现强大的价值学习和高效的表层探索。在几项任务中,它也达到了最先进的稀疏培训业绩,显示7.5\时间-20时间模型压缩,低于3%的绩效退化,在20\时间和50-时间分别进行FOP-Lerview。</s>

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

活性氧介导的内质网应激在博莱霉素诱发肺上皮-间质转化和肺纤维化中的作用

国家自然科学基金

0+阅读 · 2016年12月31日

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

保险中两类随机最优控制问题及策略过程概率分布研究

国家自然科学基金

0+阅读 · 2014年12月31日

超高速碰撞供电太阳能电池阵产生等离子体的放电特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

健康行为中的跨期决策研究--基于资源匮乏理论视角

国家自然科学基金

0+阅读 · 2014年12月31日

纳米线构筑的三维网络中温SOFC抗积碳复合阳极的结构与性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

多级孔纳米结构上Ce/Zr基催化剂联合低温等离子体脱除VOCs的基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

最小最大后悔准则下的应急设施选址策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

随机泛函微分方程的动力学性态

国家自然科学基金

0+阅读 · 2012年12月31日

DLC-1信号通路系统介导TRAIL诱导人非小细胞肺癌细胞凋亡的研究

国家自然科学基金

0+阅读 · 2011年12月31日

A Federated Reinforcement Learning Framework for Link Activation in Multi-link Wi-Fi Networks

Arxiv

0+阅读 · 2023年4月28日

Adversarial Policy Optimization in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年4月27日

CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing

Arxiv

0+阅读 · 2023年4月26日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Deep Reinforcement Learning: An Overview

Arxiv

15+阅读 · 2018年6月23日

VIP会员

文章信息

相关主题

深度强化学习

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Federated Reinforcement Learning Framework for Link Activation in Multi-link Wi-Fi Networks

Arxiv

0+阅读 · 2023年4月28日

Adversarial Policy Optimization in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年4月27日

CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing

Arxiv

0+阅读 · 2023年4月26日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Deep Reinforcement Learning: An Overview

Arxiv

15+阅读 · 2018年6月23日

相关基金

活性氧介导的内质网应激在博莱霉素诱发肺上皮-间质转化和肺纤维化中的作用

国家自然科学基金

0+阅读 · 2016年12月31日

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

保险中两类随机最优控制问题及策略过程概率分布研究

国家自然科学基金

0+阅读 · 2014年12月31日

超高速碰撞供电太阳能电池阵产生等离子体的放电特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

健康行为中的跨期决策研究--基于资源匮乏理论视角

国家自然科学基金

0+阅读 · 2014年12月31日

纳米线构筑的三维网络中温SOFC抗积碳复合阳极的结构与性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

多级孔纳米结构上Ce/Zr基催化剂联合低温等离子体脱除VOCs的基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

最小最大后悔准则下的应急设施选址策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

随机泛函微分方程的动力学性态

国家自然科学基金

0+阅读 · 2012年12月31日

DLC-1信号通路系统介导TRAIL诱导人非小细胞肺癌细胞凋亡的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员