目标网络和排行网以美元学习超过死亡三合一的致命三合一 (Target Network and Truncation Overcome The Deadly Triad in $Q$-Learning) - 专知论文

会员服务 ·

0

近似 · 泛函 · Networking · 近似误差 · 线性的 ·

2022 年 5 月 3 日

Target Network and Truncation Overcome The Deadly Triad in $Q$-Learning

翻译：目标网络和排行网以美元学习超过死亡三合一的致命三合一

Zaiwei Chen,John Paul Clarke,Siva Theja Maguluri

$Q$-learning with function approximation is one of the most empirically successful while theoretically mysterious reinforcement learning (RL) algorithms, and was identified in Sutton (1999) as one of the most important theoretical open problems in the RL community. Even in the basic linear function approximation setting, there are well-known divergent examples. In this work, we show that \textit{target network} and \textit{truncation} together are enough to provably stabilize $Q$-learning with linear function approximation, and we establish the finite-sample guarantees. The result implies an $O(\epsilon^{-2})$ sample complexity up to a function approximation error. Moreover, our results do not require strong assumptions or modifying the problem parameters as in existing literature.

翻译：以函数近似值学习Q$是经验上最成功的之一,而理论上神秘的强化学习算法(RL)在理论上最为成功,在Sutton(1999年)中被确定为RL社区最重要的理论开放问题之一。即使在基本的线性函数近似设置中,也有众所周知的不同例子。在这项工作中,我们显示\ textit{目标网络}和\textit{truit{truncation}合在一起足以用线性函数近似来稳定$Q$的学习,而我们则建立了有限抽样保证。结果意味着在功能近似误差之前,样本复杂度为$O( epsilon}-2})$。此外,我们的结果并不需要强有力的假设或修改现有文献中的问题参数。

0

相关内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

稀土硫氧化物上转换荧光探针的一步合成与生物成像研究

国家自然科学基金

0+阅读 · 2015年12月31日

Akt/USP8/Nrdp1通路在TNFSF15抑制脑创伤后小胶质细胞过度活化中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

前列腺癌骨转移低表达miRNA调控KLF17促进骨转移的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

单载波频域均衡水声通信中稀疏信道估计及多通道均衡技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

拟南芥蓝光受体CRY2调控向重性的分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于超辐射机制的太赫兹Smith-Purcell自由电子激光特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hint1与Girdin/Akt及Src信号通路串话在肝癌细胞增殖中的调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于工件恶化的并行批调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

mir-23a调控的卵泡颗粒细胞凋亡在卵巢早衰发病中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

LDA+Guztwiller方法研究铁基超导体

国家自然科学基金

0+阅读 · 2009年12月31日

Variational Inference with Gaussian Mixture by Entropy Approximation

Arxiv

0+阅读 · 2022年6月21日

Learning to Share in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月21日

Tyler's and Maronna's M-estimators: Non-Asymptotic Concentration Results

Arxiv

0+阅读 · 2022年6月21日

Event-Case Correlation for Process Mining using Probabilistic Optimization

Arxiv

0+阅读 · 2022年6月20日

Motley: Benchmarking Heterogeneity and Personalization in Federated Learning

Arxiv

0+阅读 · 2022年6月18日

On the Influence of Enforcing Model Identifiability on Learning dynamics of Gaussian Mixture Models

Arxiv

0+阅读 · 2022年6月17日

A Survey on Multi-Task Learning

Arxiv

31+阅读 · 2021年3月29日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Arxiv

20+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Variational Inference with Gaussian Mixture by Entropy Approximation

Arxiv

0+阅读 · 2022年6月21日

Learning to Share in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月21日

Tyler's and Maronna's M-estimators: Non-Asymptotic Concentration Results

Arxiv

0+阅读 · 2022年6月21日

Event-Case Correlation for Process Mining using Probabilistic Optimization

Arxiv

0+阅读 · 2022年6月20日

Motley: Benchmarking Heterogeneity and Personalization in Federated Learning

Arxiv

0+阅读 · 2022年6月18日

On the Influence of Enforcing Model Identifiability on Learning dynamics of Gaussian Mixture Models

Arxiv

0+阅读 · 2022年6月17日

A Survey on Multi-Task Learning

Arxiv

31+阅读 · 2021年3月29日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Arxiv

20+阅读 · 2018年3月27日

相关基金

稀土硫氧化物上转换荧光探针的一步合成与生物成像研究

国家自然科学基金

0+阅读 · 2015年12月31日

Akt/USP8/Nrdp1通路在TNFSF15抑制脑创伤后小胶质细胞过度活化中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

前列腺癌骨转移低表达miRNA调控KLF17促进骨转移的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

单载波频域均衡水声通信中稀疏信道估计及多通道均衡技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

拟南芥蓝光受体CRY2调控向重性的分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于超辐射机制的太赫兹Smith-Purcell自由电子激光特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hint1与Girdin/Akt及Src信号通路串话在肝癌细胞增殖中的调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于工件恶化的并行批调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

mir-23a调控的卵泡颗粒细胞凋亡在卵巢早衰发病中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

LDA+Guztwiller方法研究铁基超导体

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员