双线类:RL中可实现普遍化的结构框架 (Bilinear Classes: A Structural Framework for Provable Generalization in RL) - 专知论文

会员服务 ·

0

样本复杂度 · 泛化理论 · 线性的 · MoDELS · 样本 ·

2021 年 3 月 19 日

Bilinear Classes: A Structural Framework for Provable Generalization in RL

翻译：双线类:RL中可实现普遍化的结构框架

Simon S. Du,Sham M. Kakade,Jason D. Lee,Shachar Lovett,Gaurav Mahajan,Wen Sun,Ruosong Wang

This work introduces Bilinear Classes, a new structural framework, which permit generalization in reinforcement learning in a wide variety of settings through the use of function approximation. The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space. Our main result provides an RL algorithm which has polynomial sample complexity for Bilinear Classes; notably, this sample complexity is stated in terms of a reduction to the generalization error of an underlying supervised learning sub-problem. These bounds nearly match the best known sample complexity bounds for existing models. Furthermore, this framework also extends to the infinite dimensional (RKHS) setting: for the the Linear $Q^*/V^*$ model, linear MDPs, and linear mixture MDPs, we provide sample complexities that have no explicit dependence on the explicit feature dimension (which could be infinite), but instead depends only on information theoretic quantities.

翻译：这项工作引入了双线类,这是一个新的结构框架,它允许通过使用功能近似值,在多种环境中对强化学习进行概括化,该框架包含几乎所有现有模型,其中多元样本复杂度是可以实现的,特别是还包括新的模型,例如Linear $ ⁇ /V ⁇ $$$$ /V ⁇ $$美元模型,其中最佳的美元功能和最佳的美元功能在某些已知特征空间是线性的。我们的主要结果提供了一种RL算法,该算法对双线类具有多元样本复杂度;特别是,这一抽样复杂度是用减少一个基本受监督的次级问题的一般性错误来表示的。这些模型的界限几乎与现有模型已知的最佳样本复杂度界限相匹配。此外,这一框架还延伸至无限的维度(RKHS)设置:对于Linear $ ⁇ /V ⁇ $美元模型、线性MDPs和线性混合物 MDPs,我们提供的样本复杂度并不明显依赖明确的特征层面(可能是无限的),而是仅取决于信息的理论量。

0

相关内容

样本复杂度

样本复杂度

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

最新《深度强化学习中的迁移学习》综述论文

最新《深度强化学习中的迁移学习》综述论文

专知会员服务

157+阅读 · 2020年9月20日

最新《深度持续学习》综述论文，32页pdf

最新《深度持续学习》综述论文，32页pdf

专知会员服务

86+阅读 · 2020年9月6日

《强化学习》简介小册，24页pdf

《强化学习》简介小册，24页pdf

专知会员服务

277+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Is Pessimism Provably Efficient for Offline RL?

Arxiv

0+阅读 · 2021年5月12日

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

Arxiv

0+阅读 · 2021年5月10日

Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

Arxiv

0+阅读 · 2021年5月10日

Polynomial time guarantees for the Burer-Monteiro method

Arxiv

0+阅读 · 2021年5月7日

Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes

Arxiv

0+阅读 · 2021年5月6日

High-dimensional Functional Graphical Model Structure Learning via Neighborhood Selection Approach

Arxiv

0+阅读 · 2021年5月6日

Information Complexity and Generalization Bounds

Arxiv

0+阅读 · 2021年5月4日

No-Regret Algorithms for Time-Varying Bayesian Optimization

Arxiv

0+阅读 · 2021年4月30日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

最新《深度强化学习中的迁移学习》综述论文

最新《深度强化学习中的迁移学习》综述论文

专知会员服务

157+阅读 · 2020年9月20日

最新《深度持续学习》综述论文，32页pdf

最新《深度持续学习》综述论文，32页pdf

专知会员服务

86+阅读 · 2020年9月6日

《强化学习》简介小册，24页pdf

《强化学习》简介小册，24页pdf

专知会员服务

277+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Is Pessimism Provably Efficient for Offline RL?

Arxiv

0+阅读 · 2021年5月12日

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

Arxiv

0+阅读 · 2021年5月10日

Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

Arxiv

0+阅读 · 2021年5月10日

Polynomial time guarantees for the Burer-Monteiro method

Arxiv

0+阅读 · 2021年5月7日

Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes

Arxiv

0+阅读 · 2021年5月6日

High-dimensional Functional Graphical Model Structure Learning via Neighborhood Selection Approach

Arxiv

0+阅读 · 2021年5月6日

Information Complexity and Generalization Bounds

Arxiv

0+阅读 · 2021年5月4日

No-Regret Algorithms for Time-Varying Bayesian Optimization

Arxiv

0+阅读 · 2021年4月30日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员