深度强化学习高效性需要控制过拟合 (Efficient Deep Reinforcement Learning Requires Regulating Overfitting) - 专知论文

会员服务 ·

0

TD · 高效性 · 拟合 · 强化学习 · 深度强化学习 ·

2023 年 4 月 20 日

Efficient Deep Reinforcement Learning Requires Regulating Overfitting

翻译：深度强化学习高效性需要控制过拟合

Qiyang Li,Aviral Kumar,Ilya Kostrikov,Sergey Levine

from arxiv, 26 pages, 18 figures, 3 tables, The International Conference on Learning Representations (ICLR) 2023

Deep reinforcement learning algorithms that learn policies by trial-and-error must learn from limited amounts of data collected by actively interacting with the environment. While many prior works have shown that proper regularization techniques are crucial for enabling data-efficient RL, a general understanding of the bottlenecks in data-efficient RL has remained unclear. Consequently, it has been difficult to devise a universal technique that works well across all domains. In this paper, we attempt to understand the primary bottleneck in sample-efficient deep RL by examining several potential hypotheses such as non-stationarity, excessive action distribution shift, and overfitting. We perform thorough empirical analysis on state-based DeepMind control suite (DMC) tasks in a controlled and systematic way to show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms, and prior methods that lead to good performance do in fact, control the validation TD error to be low. This observation gives us a robust principle for making deep RL efficient: we can hill-climb on the validation TD error by utilizing any form of regularization techniques from supervised learning. We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.

翻译：通过与环境积极交互来学习策略的深度强化学习算法必须从有限的数据中学习。许多先前的工作已经表明，适当的正则化技术对于实现数据高效强化学习至关重要，但是关于保障数据高效强化学习的瓶颈的普遍理解仍不清晰。因此，很难设计一种适用于所有领域的通用技术。在本文中，我们试图通过检查几个潜在的假设（如非稳态、过度行为分布转移和过度拟合）来了解样本高效深度 RL 中的主要瓶颈。我们对基于状态的 deepmind 控制套件（DMC）任务进行了全面的实证分析，以控制并系统地展示高时间差错（TD）验证集转换是深度 RL 算法严重影响性能的主要罪魁祸首，而导致良好性能的先前方法实际上是控制验证 TD 错误保持低。这一观察结果为我们提供了一个使深度 RL 高效的坚实原则：我们可以使用来自监督学习的任何形式的正则化技术来在验证 TD 错误上进行希尔爬升。我们展示了一种简单的在线模型选择方法，以便在基于状态的 DMC 和 Gym 任务上进行验证 TD 错误，该方法是有效的。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【电子书推荐】强化学习（Reinforcement Learning）法兰克福大学 | Cornelius Weber

【电子书推荐】强化学习（Reinforcement Learning）法兰克福大学 | Cornelius Weber

专知会员服务

44+阅读 · 2019年11月19日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TensorFlow 2.0深度强化学习指南

TensorFlow 2.0深度强化学习指南

云栖社区

18+阅读 · 2019年2月1日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

专知

77+阅读 · 2018年8月7日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

金属酞菁(MPc)/电纺碳纳米纤维(CNFs)异质结材料的构筑及可见光催化性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

网络运营环境下高速列车鲁棒协同优化运行调整策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于自适应神经网络的小型无人机高精度控制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

考虑能源效率的批调度问题研究与算法设计

国家自然科学基金

0+阅读 · 2012年12月31日

风光储联合发电系统中的有功功率控制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

锂电池管理系统的估计和控制算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向云计算的自含式数据安全控制理论与方法

国家自然科学基金

1+阅读 · 2011年12月31日

精馏过程基于分片线性代理模型的实时优化和控制技术研究与应用

国家自然科学基金

0+阅读 · 2009年12月31日

Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

Arxiv

0+阅读 · 2023年6月5日

Deep Reinforcement Learning with Swin Transformers

Arxiv

0+阅读 · 2023年6月5日

For SALE: State-Action Representation Learning for Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年6月4日

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月4日

An Architecture for Deploying Reinforcement Learning in Industrial Environments

Arxiv

0+阅读 · 2023年6月2日

A Survey on Transformers in Reinforcement Learning

Arxiv

31+阅读 · 2023年1月8日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Meta-learning in natural and artificial intelligence

Arxiv

10+阅读 · 2020年11月26日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

VIP会员

文章信息

相关主题

深度强化学习

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【电子书推荐】强化学习（Reinforcement Learning）法兰克福大学 | Cornelius Weber

【电子书推荐】强化学习（Reinforcement Learning）法兰克福大学 | Cornelius Weber

专知会员服务

44+阅读 · 2019年11月19日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TensorFlow 2.0深度强化学习指南

TensorFlow 2.0深度强化学习指南

云栖社区

18+阅读 · 2019年2月1日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

专知

77+阅读 · 2018年8月7日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

Arxiv

0+阅读 · 2023年6月5日

Deep Reinforcement Learning with Swin Transformers

Arxiv

0+阅读 · 2023年6月5日

For SALE: State-Action Representation Learning for Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年6月4日

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月4日

An Architecture for Deploying Reinforcement Learning in Industrial Environments

Arxiv

0+阅读 · 2023年6月2日

A Survey on Transformers in Reinforcement Learning

Arxiv

31+阅读 · 2023年1月8日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Meta-learning in natural and artificial intelligence

Arxiv

10+阅读 · 2020年11月26日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

相关基金

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

金属酞菁(MPc)/电纺碳纳米纤维(CNFs)异质结材料的构筑及可见光催化性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

网络运营环境下高速列车鲁棒协同优化运行调整策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于自适应神经网络的小型无人机高精度控制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

考虑能源效率的批调度问题研究与算法设计

国家自然科学基金

0+阅读 · 2012年12月31日

风光储联合发电系统中的有功功率控制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

锂电池管理系统的估计和控制算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向云计算的自含式数据安全控制理论与方法

国家自然科学基金

1+阅读 · 2011年12月31日

精馏过程基于分片线性代理模型的实时优化和控制技术研究与应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员