使用线性函数近似化的统一-PAC强化学习界径 (Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation) - 专知论文

会员服务 ·

0

近似 · 线性的 · 泛函 · PAC学习理论 · 强化学习 ·

2021 年 12 月 31 日

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

翻译：使用线性函数近似化的统一-PAC强化学习界径

Jiafan He,Dongruo Zhou,Quanquan Gu

from arxiv, 27 pages. In NeurIPS 2021

We study reinforcement learning (RL) with linear function approximation. Existing algorithms for this problem only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees, which cannot guarantee the convergence to the optimal policy. In this paper, in order to overcome the limitation of existing algorithms, we propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability. The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation. At the core of our algorithm is a novel minimax value function estimator and a multi-level partition scheme to select the training samples from historical observations. Both of these techniques are new and of independent interest.

翻译：我们用线性函数近似值研究强化学习(RL) 。这一问题的现有算法只有高概率的遗憾和/或很可能是近似正确(PAC)样本复杂性保证,无法保证与最佳政策趋同。在本文中,为了克服现有算法的局限性,我们提议了一个新的算法,称为FLUTE,它具有与最佳政策的统一-PAC趋同的概率很高。统一-PAC担保是文献中强化学习的最有力保障,它可能直接暗示PAC和高概率遗憾界限,使我们的算法优于所有现有具有线性函数近似(PAC)的算法。在我们的算法中,我们的核心是一个新的微缩运算法值估计器和从历史观察中选择培训样本的多层次分割法。这两种技术都是新的,都具有独立的兴趣。

0

相关内容

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

半监督进化文本聚类算法在动态多源文本分析上的研究

国家自然科学基金

2+阅读 · 2014年12月31日

BRCA1蛋白出核的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂动态场景中鲁棒的视觉跟踪算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用参量结构实现复杂信号环境下盲信号分离方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于肺结节多正交位CT图像Curvelet纹理构建 Gradient Boosting 集成预测模型

国家自然科学基金

0+阅读 · 2011年12月31日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Optimal bounds for numerical approximations of infinite horizon problems based on dynamic programming approach

Arxiv

1+阅读 · 2022年4月19日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

Active Learning with Weak Labels for Gaussian Processes

Arxiv

2+阅读 · 2022年4月18日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

VIP会员

文章信息

相关主题

PAC学习理论

相关VIP内容

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《北约认知战概念报告》

《预测促成大规模货运无人机的技术趋势与影响》报告

美海军放弃星座级转而采用国家安全巡逻舰设计

《北约作战弹性概念》报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Optimal bounds for numerical approximations of infinite horizon problems based on dynamic programming approach

Arxiv

1+阅读 · 2022年4月19日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

Active Learning with Weak Labels for Gaussian Processes

Arxiv

2+阅读 · 2022年4月18日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

相关基金

半监督进化文本聚类算法在动态多源文本分析上的研究

国家自然科学基金

2+阅读 · 2014年12月31日

BRCA1蛋白出核的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂动态场景中鲁棒的视觉跟踪算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用参量结构实现复杂信号环境下盲信号分离方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于肺结节多正交位CT图像Curvelet纹理构建 Gradient Boosting 集成预测模型

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员