Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL - 专知论文

会员服务 ·

0

优化器 · SimPLe · 样本复杂度 · 在线 · 线性的 ·

2023 年 5 月 18 日

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

翻译：暂无翻译

Qinghua Liu,Gellért Weisz,András György,Chi Jin,Csaba Szepesvári

While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary. This paper proposes a simple efficient policy optimization framework -- Optimistic NPG for online RL. Optimistic NPG can be viewed as simply combining of the classic natural policy gradient (NPG) algorithm [Kakade, 2001] with optimistic policy evaluation subroutines to encourage exploration. For $d$-dimensional linear MDPs, Optimistic NPG is computationally efficient, and learns an $\varepsilon$-optimal policy within $\tilde{O}(d^2/\varepsilon^3)$ samples, which is the first computationally efficient algorithm whose sample complexity has the optimal dimension dependence $\tilde{\Theta}(d^2)$. It also improves over state-of-the-art results of policy optimization algorithms [Zanette et al., 2021] by a factor of $d$. For general function approximation that subsumes linear MDPs, Optimistic NPG, to our best knowledge, is also the first policy optimization algorithm that achieves the polynomial sample complexity for learning near-optimal policies.

翻译：暂无翻译

0

相关内容

优化器

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

一类离散Hindmarsh-Rose模型的分支延拓

国家自然科学基金

0+阅读 · 2015年12月31日

特征值与图的结构

国家自然科学基金

0+阅读 · 2012年12月31日

脉冲延迟微分方程数值分析

国家自然科学基金

0+阅读 · 2012年12月31日

Ni-M(M=Cu, Ag, Au)双金属催化剂催化甲烷水蒸气重整制氢的理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

BEC的保几何结构数值模拟与研究

国家自然科学基金

0+阅读 · 2011年12月31日

Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

Arxiv

0+阅读 · 2023年7月5日

Monte Carlo Policy Gradient Method for Binary Optimization

Arxiv

0+阅读 · 2023年7月3日

Sample Efficient Deep Reinforcement Learning via Local Planning

Arxiv

0+阅读 · 2023年7月3日

A Proximal Algorithm for Sampling

Arxiv

0+阅读 · 2023年6月30日

Accelerating Inexact HyperGradient Descent for Bilevel Optimization

Arxiv

0+阅读 · 2023年6月30日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

因果强化学习的统一框架：综述、分类体系、算法与应用

《无人机系统 - 反无人机系统：测试方法》364页

【MIT博士论文】语言模型的推理时学习算法

美军低成本无人作战攻击系统（LUCAS）：扩大无人机战争规模

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

相关论文

Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

Arxiv

0+阅读 · 2023年7月5日

Monte Carlo Policy Gradient Method for Binary Optimization

Arxiv

0+阅读 · 2023年7月3日

Sample Efficient Deep Reinforcement Learning via Local Planning

Arxiv

0+阅读 · 2023年7月3日

A Proximal Algorithm for Sampling

Arxiv

0+阅读 · 2023年6月30日

Accelerating Inexact HyperGradient Descent for Bilevel Optimization

Arxiv

0+阅读 · 2023年6月30日

相关基金

一类离散Hindmarsh-Rose模型的分支延拓

国家自然科学基金

0+阅读 · 2015年12月31日

特征值与图的结构

国家自然科学基金

0+阅读 · 2012年12月31日

脉冲延迟微分方程数值分析

国家自然科学基金

0+阅读 · 2012年12月31日

Ni-M(M=Cu, Ag, Au)双金属催化剂催化甲烷水蒸气重整制氢的理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

BEC的保几何结构数值模拟与研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员