使用线性函数接近度分布式强力离线强化学习 (Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation) - 专知论文

会员服务 ·

0

Learning · 线性的 · 稳健性 · 泛函 · 近似 ·

2022 年 9 月 14 日

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

翻译：使用线性函数接近度分布式强力离线强化学习

Xiaoteng Ma,Zhipeng Liang,Li Xia,Jiheng Zhang,Jose Blanchet,Mingwen Liu,Qianchuan Zhao,Zhengyuan Zhou

from arxiv, First two authors contribute equally

Among the reasons that hinder the application of reinforcement learning (RL) to real-world problems, two factors are critical: limited data and the mismatch of the testing environment compared to training one. In this paper, we attempt to address these issues simultaneously with the problem setup of distributionally robust offline RL. Particularly, we learn an RL agent with the historical data obtained from the source environment and optimize it to perform well in the perturbed one. Moreover, we consider the linear function approximation to apply the algorithm to large-scale problems. We prove our algorithm can achieve the suboptimality of $O(1/\sqrt{K})$ depending on the linear function dimension $d$, which seems to be the first result with sample complexity guarantee in this setting. Diverse experiments are conducted to demonstrate our theoretical findings, showing the superiority of our algorithm against the non-robust one.

翻译：阻碍将强化学习(RL)应用到现实世界问题的原因中,有两个因素是关键因素:数据有限,测试环境与培训环境不匹配。在本文中,我们试图同时解决这些问题,同时设置分布性强的离线RL的问题。特别是,我们从源环境获得的历史数据中学习一个RL代理,并优化其在受扰动环境中的表现。此外,我们认为线性功能近似可以将算法应用于大规模问题。我们证明我们的算法可以达到线性功能维度为$(1/\sqrt{K})的亚优值,这取决于线性功能维度为$d$($d),这似乎是在这一环境中取得样本复杂性保证的第一个结果。我们进行了多种实验,以展示我们的理论发现,显示我们的算法优于非机器人。

0

相关内容

Learning

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

纳米加工表面性能演变规律的研究

国家自然科学基金

0+阅读 · 2014年12月31日

导电高分子纳米管/金属磁性纳米颗粒复合物新型吸波材料的制备及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

几类Pfaffian图的结构性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

溶液加工型白光高分子材料

国家自然科学基金

0+阅读 · 2012年12月31日

两类布尔函数的密码性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

SWEETs家族基因在番茄果实糖转运与积累过程中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

图的子图横贯与子图回避染色

国家自然科学基金

1+阅读 · 2011年12月31日

新型核壳结构ZnS量子点-染料荧光共振能量转移体系的构筑及其分析应用

国家自然科学基金

0+阅读 · 2011年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Arxiv

0+阅读 · 2022年10月24日

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

Arxiv

0+阅读 · 2022年10月24日

Learning a subspace of policies for online adaptation in Reinforcement Learning

Arxiv

0+阅读 · 2022年10月24日

A Novel Adaptive Causal Sampling Method for Physics-Informed Neural Networks

Arxiv

0+阅读 · 2022年10月24日

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Arxiv

0+阅读 · 2022年10月22日

Distributionally Robust Bayesian Optimization with $φ$-divergences

Arxiv

0+阅读 · 2022年10月21日

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

Arxiv

0+阅读 · 2022年10月21日

Dynamic selection of p-norm in linear adaptive filtering via online kernel-based reinforcement learning

Arxiv

0+阅读 · 2022年10月21日

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Arxiv

0+阅读 · 2022年10月20日

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Arxiv

0+阅读 · 2022年10月20日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

相关论文

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Arxiv

0+阅读 · 2022年10月24日

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

Arxiv

0+阅读 · 2022年10月24日

Learning a subspace of policies for online adaptation in Reinforcement Learning

Arxiv

0+阅读 · 2022年10月24日

A Novel Adaptive Causal Sampling Method for Physics-Informed Neural Networks

Arxiv

0+阅读 · 2022年10月24日

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Arxiv

0+阅读 · 2022年10月22日

Distributionally Robust Bayesian Optimization with $φ$-divergences

Arxiv

0+阅读 · 2022年10月21日

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

Arxiv

0+阅读 · 2022年10月21日

Dynamic selection of p-norm in linear adaptive filtering via online kernel-based reinforcement learning

Arxiv

0+阅读 · 2022年10月21日

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Arxiv

0+阅读 · 2022年10月20日

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Arxiv

0+阅读 · 2022年10月20日

相关基金

纳米加工表面性能演变规律的研究

国家自然科学基金

0+阅读 · 2014年12月31日

导电高分子纳米管/金属磁性纳米颗粒复合物新型吸波材料的制备及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

几类Pfaffian图的结构性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

溶液加工型白光高分子材料

国家自然科学基金

0+阅读 · 2012年12月31日

两类布尔函数的密码性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

SWEETs家族基因在番茄果实糖转运与积累过程中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

图的子图横贯与子图回避染色

国家自然科学基金

1+阅读 · 2011年12月31日

新型核壳结构ZnS量子点-染料荧光共振能量转移体系的构筑及其分析应用

国家自然科学基金

0+阅读 · 2011年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员