解决示范离线强化学习的抽样复杂性 (Settling the Sample Complexity of Model-Based Offline Reinforcement Learning) - 专知论文

会员服务 ·

0

样本复杂度 · 样本 · Learning · CASES · 代价 ·

2023 年 2 月 16 日

Settling the Sample Complexity of Model-Based Offline Reinforcement Learning

翻译：解决示范离线强化学习的抽样复杂性

Gen Li,Laixi Shi,Yuxin Chen,Yuejie Chi,Yuting Wei

This paper is concerned with offline reinforcement learning (RL), which learns using pre-collected data without further exploration. Effective offline RL would be able to accommodate distribution shift and limited data coverage. However, prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality, thus posing an impediment to efficient offline RL in sample-starved applications. We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost for tabular Markov decision processes (MDPs). Concretely, consider a finite-horizon (resp. $\gamma$-discounted infinite-horizon) MDP with $S$ states and horizon $H$ (resp. effective horizon $\frac{1}{1-\gamma}$), and suppose the distribution shift of data is reflected by some single-policy clipped concentrability coefficient $C^{\star}_{\text{clipped}}$. We prove that model-based offline RL yields $\varepsilon$-accuracy with a sample complexity of \[ \begin{cases} \frac{H^{4}SC_{\text{clipped}}^{\star}}{\varepsilon^{2}} & (\text{finite-horizon MDPs}) \frac{SC_{\text{clipped}}^{\star}}{(1-\gamma)^{3}\varepsilon^{2}} & (\text{infinite-horizon MDPs}) \end{cases} \] up to log factor, which is minimax optimal for the entire $\varepsilon$-range. The proposed algorithms are ``pessimistic'' variants of value iteration with Bernstein-style penalties, and do not require sophisticated variance reduction. Our analysis framework is established upon delicate leave-one-out decoupling arguments in conjunction with careful self-bounding techniques tailored to MDPs.

翻译：本文关注离线强化学习 (RL), 该选项无需进一步探索就学习使用预收集的数据。有效的离线 RL 能够容纳分布转移和有限的数据覆盖。但是, 先前的算法或分析要么存在亚优性样本复杂性, 要么产生高燃烧成本以达到样本优化性, 从而阻碍在样本萎缩应用程序中高效脱线 RL 。我们证明基于模型的( 或“ 插入 ” ) 方法在不为列表 Markov 决策程序( MDPs) 刻录成本的情况下实现了最优化的样本复杂性。具体地说, 考虑以 $gamma $- dol- dexion 技术( 有效地平线 $\ forc{ 1\\\\\\ gamma} 工具。假设数据分布变化由某些单政策缩略式的调控调调调的调调系数( $C_ ⁇ stal- text} =cretating $。我们证明基于模型的 Reximal Reximal- remacial remasi r= rus mice sl= rus r= rmacial= rl= = = = = rmal = = = = = =xxxxxxxxl=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxl=l=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

0

相关内容

样本复杂度

样本复杂度

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

面向网络环境脉冲式需求的采购风险控制

国家自然科学基金

0+阅读 · 2014年12月31日

合金化元素（Ti/Nb/Re）对W中He起泡行为的影响及机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

动力学方程的数学理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

保险投资组合随机模型中风险控制及优化分红问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

非对称矩阵优化问题的灵敏度分析、算法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

贝尔曼-伊萨克方程的研究和金融应用

国家自然科学基金

0+阅读 · 2012年12月31日

拋物奇异积分算子有界性及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

Large-Scale Regional Traffic Signal Control Using Dynamic Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年4月7日

AR3n: A Reinforcement Learning-based Assist-As-Needed Controller for Robotic Rehabilitation

Arxiv

0+阅读 · 2023年4月6日

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月5日

On Complexity of 1-Center in Various Metrics

Arxiv

0+阅读 · 2023年4月4日

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年4月4日

Empirical Design in Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Large-Scale Regional Traffic Signal Control Using Dynamic Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年4月7日

AR3n: A Reinforcement Learning-based Assist-As-Needed Controller for Robotic Rehabilitation

Arxiv

0+阅读 · 2023年4月6日

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月5日

On Complexity of 1-Center in Various Metrics

Arxiv

0+阅读 · 2023年4月4日

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年4月4日

Empirical Design in Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

面向网络环境脉冲式需求的采购风险控制

国家自然科学基金

0+阅读 · 2014年12月31日

合金化元素（Ti/Nb/Re）对W中He起泡行为的影响及机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

动力学方程的数学理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

保险投资组合随机模型中风险控制及优化分红问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

非对称矩阵优化问题的灵敏度分析、算法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

贝尔曼-伊萨克方程的研究和金融应用

国家自然科学基金

0+阅读 · 2012年12月31日

拋物奇异积分算子有界性及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员