改进强化学习算法:实现最佳学习率政策 (Improving reinforcement learning algorithms: towards optimal learning rate policies) - 专知论文

会员服务 ·

0

优化器 · 学成 · 学习率 · 强化学习 · INTERACT ·

2021 年 10 月 17 日

Improving reinforcement learning algorithms: towards optimal learning rate policies

翻译：改进强化学习算法:实现最佳学习率政策

Othmane Mounjid,Charles-Albert Lehalle

This paper investigates to what extent one can improve reinforcement learning algorithms. Our study is split in three parts. First, our analysis shows that the classical asymptotic convergence rate $O(1/\sqrt{N})$ is pessimistic and can be replaced by $O((\log(N)/N)^{\beta})$ with $\frac{1}{2}\leq \beta \leq 1$ and $N$ the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate $(\gamma_k)_{k\geq 0}$ used in stochastic approximation (SA). We decompose our policy into two interacting levels: the inner and the outer level. In the inner level, we present the \nameref{Alg:v_4_s} algorithm (for "PAst Sign Search") which, based on a predefined sequence $(\gamma^o_k)_{k\geq 0}$, constructs a new sequence $(\gamma^i_k)_{k\geq 0}$ whose error decreases faster. In the outer level, we propose an optimal methodology for the selection of the predefined sequence $(\gamma^o_k)_{k\geq 0}$. Third, we show empirically that our selection methodology of the learning rate outperforms significantly standard algorithms used in reinforcement learning (RL) in the three following applications: the estimation of a drift, the optimal placement of limit orders and the optimal execution of large number of shares.

翻译：本文调查我们在多大程度上可以改进强化学习算法。我们的研究分为三个部分。首先, 我们的分析显示, 古典失常趋同率 $O( 1/\ sqrt{N} $) 是悲观的, 可以用$O( log( N)/ N)\\\\\ beeta} 美元替换为$( frac{ 1\\\\\\\\\\\\\\\\\ \leq 1 美元) 和迭代数。其次, 我们提出一个动态的最佳政策, 用于选择 $( gamma_ k) / k\ g\ g Q) 的学习率。我们将我们的政策分解成两个互动级别 : 内和外等级别。在内部级别, 我们用 nameref{ Alg: v_ 4_ sq} 算法( “ Past sign search 搜索 ” ), 根据一个预定义的序列 $(\ gammamamable rial) ex relection a rial relection rial relection riew ristration a mess lemental riewation riew riew lemental riew lement lemental lection lemental lection a.

0

相关内容

优化器

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

PyTorch深度学习零基础入门《First steps towards Deep Learning with pyTorch》

PyTorch深度学习零基础入门《First steps towards Deep Learning with pyTorch》

专知会员服务

120+阅读 · 2019年10月28日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

Arxiv

0+阅读 · 2021年12月14日

Probability Density Estimation Based Imitation Learning

Arxiv

0+阅读 · 2021年12月13日

Verified Approximation Algorithms

Arxiv

0+阅读 · 2021年12月13日

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

Arxiv

0+阅读 · 2021年12月13日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

0+阅读 · 2021年12月10日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Arxiv

9+阅读 · 2021年2月23日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

PyTorch深度学习零基础入门《First steps towards Deep Learning with pyTorch》

PyTorch深度学习零基础入门《First steps towards Deep Learning with pyTorch》

专知会员服务

120+阅读 · 2019年10月28日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据驱动死亡：以色列AI战争机器如何锁定目标

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

ICML 2025 | BiAssemble: 双臂机器人几何拼合问题的协同可供性学习

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

Arxiv

0+阅读 · 2021年12月14日

Probability Density Estimation Based Imitation Learning

Arxiv

0+阅读 · 2021年12月13日

Verified Approximation Algorithms

Arxiv

0+阅读 · 2021年12月13日

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

Arxiv

0+阅读 · 2021年12月13日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

0+阅读 · 2021年12月10日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Arxiv

9+阅读 · 2021年2月23日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员