近最佳抽样复杂程度的基于模型的离线强化学习 (Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity) - 专知论文

会员服务 ·

0

稳健性 · Learning · 样本 · 样本复杂度 · Performer ·

2022 年 12 月 25 日

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

翻译：近最佳抽样复杂程度的基于模型的离线强化学习

Laixi Shi,Yuejie Chi

This paper concerns the central issues of model robustness and sample efficiency in offline reinforcement learning (RL), which aims to learn to perform decision making from history data without active exploration. Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy -- with as few samples as possible -- that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset. We consider a distributionally robust formulation of offline RL, focusing on tabular robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings. To combat with sample scarcity, a model-based algorithm that combines distributionally robust value iteration with the principle of pessimism in the face of uncertainty is proposed, by penalizing the robust value estimates with a carefully designed data-driven penalty term. Under a mild and tailored assumption of the history dataset that measures distribution shift without requiring full coverage of the state-action space, we establish the finite-sample complexity of the proposed algorithm, and further show it is almost unimprovable in light of a nearly-matching information-theoretic lower bound up to a polynomial factor of the (effective) horizon length. To the best our knowledge, this provides the first provably near-optimal robust offline RL algorithm that learns under model uncertainty and partial coverage.

翻译：本文涉及离线强化学习中的模型稳健性和抽样效率等核心问题,目的是学习如何从历史数据中做出决策,而不进行积极勘探。由于环境的不确定性和差异性,至关重要的是要学习一个稳健的政策 -- -- 尽可能少的样本 -- -- 即使部署的环境偏离了用于收集历史数据集的名义环境,该政策也表现良好。我们认为,对离线RL进行分布式强的配方式配方,重点是表格式稳健的Markov决策程序,其不确定性是由Kullback-Leibiler在限定和无限正方位设置上的差异所设定的。要克服抽样稀缺,建议采用基于模型的算法,将分布稳健的数值反复计算与面对不确定性的悲观原则结合起来,用精心设计的数据驱动的罚款术语对稳健的估值进行惩罚。根据对历史数据集的简单和定制的假设,即测量模式的分布变化不需要完全覆盖州-行动空间,我们建立了拟议的算法的有限缩略性复杂度,并进一步表明,在接近稳健度的缩缩度范围内,这是我们最接近最难的学习因素。

0

相关内容

稳健性

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

高维积分波动率矩阵的估计及其在资产投资中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

八声杜鹃与长尾缝叶莺的协同进化研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

最小二乘有限元法的湍流大涡模拟及其并行计算

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Haccpper环境中不锈钢表面活性与电化学噪声特征研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子点和稀土离子共敏化二氧化钛纳米管阵列太阳能电池的研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模风电并网对电力系统概率稳定性影响及有效控制措施研究

国家自然科学基金

0+阅读 · 2012年12月31日

(Re)$^2$H2O: Autonomous Driving Scenario Generation via Reversely Regularized Hybrid Offline-and-Online Reinforcement Learning

Arxiv

0+阅读 · 2023年2月27日

The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月27日

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

Arxiv

0+阅读 · 2023年2月26日

On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors

Arxiv

0+阅读 · 2023年2月25日

On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process

Arxiv

0+阅读 · 2023年2月25日

Uniformly Conservative Exploration in Reinforcement Learning

Uniformly Conservative Exploration in Reinforcement Learning

Arxiv

0+阅读 · 2023年2月24日

Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs

Arxiv

0+阅读 · 2023年2月24日

Bayesian estimation of information-theoretic metrics for sparsely sampled distributions

Arxiv

0+阅读 · 2023年2月22日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

(Re)$^2$H2O: Autonomous Driving Scenario Generation via Reversely Regularized Hybrid Offline-and-Online Reinforcement Learning

Arxiv

0+阅读 · 2023年2月27日

The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月27日

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

Arxiv

0+阅读 · 2023年2月26日

On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors

Arxiv

0+阅读 · 2023年2月25日

On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process

Arxiv

0+阅读 · 2023年2月25日

Uniformly Conservative Exploration in Reinforcement Learning

Uniformly Conservative Exploration in Reinforcement Learning

Arxiv

0+阅读 · 2023年2月24日

Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs

Arxiv

0+阅读 · 2023年2月24日

Bayesian estimation of information-theoretic metrics for sparsely sampled distributions

Arxiv

0+阅读 · 2023年2月22日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

相关基金

高维积分波动率矩阵的估计及其在资产投资中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

八声杜鹃与长尾缝叶莺的协同进化研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

最小二乘有限元法的湍流大涡模拟及其并行计算

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Haccpper环境中不锈钢表面活性与电化学噪声特征研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子点和稀土离子共敏化二氧化钛纳米管阵列太阳能电池的研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模风电并网对电力系统概率稳定性影响及有效控制措施研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员