积极加强学习中近于最佳的政策识别 (Near-optimal Policy Identification in Active Reinforcement Learning) - 专知论文

会员服务 ·

0

样本复杂度 · Learning · 优化器 · 强化学习 · 值迭代 ·

2022 年 12 月 19 日

Near-optimal Policy Identification in Active Reinforcement Learning

翻译：积极加强学习中近于最佳的政策识别

Xiang Li,Viraj Mehta,Johannes Kirschner,Ian Char,Willie Neiswanger,Jeff Schneider,Andreas Krause,Ilija Bogunovic

Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required.

翻译：许多实际世界强化学习任务要求控制复杂的动态系统,这些系统既涉及昂贵的数据获取过程,也涉及大型国家空间。如果过渡动态可以在特定国家(例如通过模拟器)进行即时评估,代理商可以在通常称为规划的地方使用\emph{generation 模型进行操作。我们建议使用AE-LSVI算法进行最佳政策识别,这是将乐观与悲观进行积极探索(AE)相结合的最小方位迭代值(LSVI)算法的新型变体。AE-LSVI可以很容易地确定整个州空间的近最佳政策 \emph{unformatly} 并实现独立于州数的多元样本复杂性保证。当我们专门使用最近推出的离线背景海湾优化设置时,我们的算法实现了更好的样本复杂性界限。我们实验性地证明,在需要稳健到初始状态时,AE-LSVI在各种环境中优于其他RL算法。

0

相关内容

样本复杂度

样本复杂度

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

复杂产品并行设计多源变更传播机理和路径优化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

内质网应激IRE1－XBP1S通路在高糖引起肾脏及系膜细胞发生氧化应激及损伤中的机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

CHOP 调控ERO1α在急性肝损伤中的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于光束分解技术的二维SPR检测技术及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于糖化合物“Ferrier Carbocyclization”汞离子荧光探针的设计、合成及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

Puma和Bim在慢性淋巴细胞白血病细胞凋亡中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Demonstration-Guided Reinforcement Learning with Efficient Exploration for Task Automation of Surgical Robot

Arxiv

0+阅读 · 2023年2月20日

Multi-Objective reward generalization: Improving performance of Deep Reinforcement Learning for applications in single-asset trading

Arxiv

0+阅读 · 2023年2月17日

Post-Episodic Reinforcement Learning Inference

Arxiv

0+阅读 · 2023年2月17日

Data Driven Reward Initialization for Preference based Reinforcement Learning

Arxiv

0+阅读 · 2023年2月17日

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Arxiv

1+阅读 · 2023年2月16日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Arxiv

12+阅读 · 2019年3月8日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Demonstration-Guided Reinforcement Learning with Efficient Exploration for Task Automation of Surgical Robot

Arxiv

0+阅读 · 2023年2月20日

Multi-Objective reward generalization: Improving performance of Deep Reinforcement Learning for applications in single-asset trading

Arxiv

0+阅读 · 2023年2月17日

Post-Episodic Reinforcement Learning Inference

Arxiv

0+阅读 · 2023年2月17日

Data Driven Reward Initialization for Preference based Reinforcement Learning

Arxiv

0+阅读 · 2023年2月17日

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Arxiv

1+阅读 · 2023年2月16日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Arxiv

12+阅读 · 2019年3月8日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

复杂产品并行设计多源变更传播机理和路径优化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

内质网应激IRE1－XBP1S通路在高糖引起肾脏及系膜细胞发生氧化应激及损伤中的机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

CHOP 调控ERO1α在急性肝损伤中的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于光束分解技术的二维SPR检测技术及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于糖化合物“Ferrier Carbocyclization”汞离子荧光探针的设计、合成及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

Puma和Bim在慢性淋巴细胞白血病细胞凋亡中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员