政策核心现象 (The Phenomenon of Policy Churn) - 专知论文

会员服务 ·

0

Learning · 贪心 · 可辨认的 · DQN · Atari ·

2022 年 10 月 20 日

The Phenomenon of Policy Churn

翻译：政策核心现象

Tom Schaul,André Barreto,John Quan,Georg Ostrovski

from arxiv, Published at NeurIPS 2022

We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly rapid pace, changing the greedy action in a large fraction of states within a handful of learning updates (in a typical deep RL set-up such as DQN on Atari). We characterise the phenomenon empirically, verifying that it is not limited to specific algorithm or environment properties. A number of ablations help whittle down the plausible explanations on why churn occurs to just a handful, all related to deep learning. Finally, we hypothesise that policy churn is a beneficial but overlooked form of implicit exploration that casts $\epsilon$-greedy exploration in a fresh light, namely that $\epsilon$-noise plays a much smaller role than expected.

翻译：我们发现并研究政策杂交现象,即贪婪政策在基于价值的强化学习中的迅速变化。政策杂交以惊人的快速速度运作,在少数的学习更新中(在典型的深度RL设置中,比如对Atari的DQN)改变大部分国家的贪婪行动。我们用经验来描述这种现象,核实它并不局限于特定的算法或环境特性。一些推理有助于减少关于为什么杂交发生于少数的、都与深层次的学习有关的合理解释。最后,我们假设政策杂交是一种有益但被忽视的隐含探索形式,在新的光线下进行“epsilon-greedy ” 的探索,即“$\epsilon-noise”的作用比预期的要小得多。

0

相关内容

Learning

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

从microRNA-132对DC、CD4+T细胞的调控探讨Behcet病发病机制

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

血管内皮细胞自噬的剪切应力调控及其在As中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA-29b介导血管平滑肌细胞AT1aR基因DNA去甲基化参与高血压发病机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

EAST上离子回旋模式转换驱动等离子体转动的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

TLR4 /ROS信号通路在AMD发病中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNAs-22经PTEN/Akt信号通路对心肌肥厚的调控

国家自然科学基金

0+阅读 · 2011年12月31日

有源光纤环形腔内相位调制产生RoF超连续光源研究

国家自然科学基金

0+阅读 · 2009年12月31日

Adaptive Sequential Surveillance with Network and Temporal Dependence

Adaptive Sequential Surveillance with Network and Temporal Dependence

Arxiv

0+阅读 · 2022年12月5日

Policy Learning with the polle package

Arxiv

0+阅读 · 2022年12月5日

Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Arxiv

0+阅读 · 2022年12月3日

Integrating Reward Maximization and Population Estimation: Sequential Decision-Making for Internal Revenue Service Audit Selection

Arxiv

0+阅读 · 2022年12月2日

HAMMER: Multi-Level Coordination of Reinforcement Learning Agents via Learned Messaging

Arxiv

0+阅读 · 2022年12月2日

Early prediction of the risk of ICU mortality with Deep Federated Learning

Arxiv

0+阅读 · 2022年12月1日

An Optimized Privacy-Utility Trade-off Framework for Differentially Private Data Sharing in Blockchain-based Internet of Things

Arxiv

0+阅读 · 2022年11月30日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Arxiv

12+阅读 · 2021年2月21日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Adaptive Sequential Surveillance with Network and Temporal Dependence

Adaptive Sequential Surveillance with Network and Temporal Dependence

Arxiv

0+阅读 · 2022年12月5日

Policy Learning with the polle package

Arxiv

0+阅读 · 2022年12月5日

Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Arxiv

0+阅读 · 2022年12月3日

Integrating Reward Maximization and Population Estimation: Sequential Decision-Making for Internal Revenue Service Audit Selection

Arxiv

0+阅读 · 2022年12月2日

HAMMER: Multi-Level Coordination of Reinforcement Learning Agents via Learned Messaging

Arxiv

0+阅读 · 2022年12月2日

Early prediction of the risk of ICU mortality with Deep Federated Learning

Arxiv

0+阅读 · 2022年12月1日

An Optimized Privacy-Utility Trade-off Framework for Differentially Private Data Sharing in Blockchain-based Internet of Things

Arxiv

0+阅读 · 2022年11月30日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Arxiv

12+阅读 · 2021年2月21日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

相关基金

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

从microRNA-132对DC、CD4+T细胞的调控探讨Behcet病发病机制

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

血管内皮细胞自噬的剪切应力调控及其在As中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA-29b介导血管平滑肌细胞AT1aR基因DNA去甲基化参与高血压发病机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

EAST上离子回旋模式转换驱动等离子体转动的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

TLR4 /ROS信号通路在AMD发病中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNAs-22经PTEN/Akt信号通路对心肌肥厚的调控

国家自然科学基金

0+阅读 · 2011年12月31日

有源光纤环形腔内相位调制产生RoF超连续光源研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员