利用有限培训任务进行元强化学习 -- -- 密度估计方法 (Meta Reinforcement Learning with Finite Training Tasks -- a Density Estimation Approach) - 专知论文

会员服务 ·

0

Learning · 估计/估计量 · Analysis · 优化器 · 正则化项 ·

2022 年 6 月 21 日

Meta Reinforcement Learning with Finite Training Tasks -- a Density Estimation Approach

翻译：利用有限培训任务进行元强化学习 -- -- 密度估计方法

Zohar Rimon,Aviv Tamar,Gilad Adler

In meta reinforcement learning (meta RL), an agent learns from a set of training tasks how to quickly solve a new task, drawn from the same task distribution. The optimal meta RL policy, a.k.a. the Bayes-optimal behavior, is well defined, and guarantees optimal reward in expectation, taken with respect to the task distribution. The question we explore in this work is how many training tasks are required to guarantee approximately optimal behavior with high probability. Recent work provided the first such PAC analysis for a model-free setting, where a history-dependent policy was learned from the training tasks. In this work, we propose a different approach: directly learn the task distribution, using density estimation techniques, and then train a policy on the learned task distribution. We show that our approach leads to bounds that depend on the dimension of the task distribution. In particular, in settings where the task distribution lies in a low-dimensional manifold, we extend our analysis to use dimensionality reduction techniques and account for such structure, obtaining significantly better bounds than previous work, which strictly depend on the number of states and actions. The key of our approach is the regularization implied by the kernel density estimation method. We further demonstrate that this regularization is useful in practice, when `plugged in' the state-of-the-art VariBAD meta RL algorithm.

翻译：在元强化学习(meta RL)中,一个代理商从一组培训任务中学习如何迅速解决从同一任务分布中抽取的新任务。最佳的元RL政策(a.k.a.a.bays-optimal)是定义明确的,保证对任务分布的预期最佳回报。我们在这项工作中探讨的问题是,需要多少培训任务来保障高度概率的大致最佳行为。最近的工作为无模式环境提供了第一个这样的PAC分析,该模式从培训任务中吸取了依赖历史的政策。在这项工作中,我们提出了一种不同的方法:直接学习任务分布,使用密度估计技术,然后对学习的任务分配进行政策培训。我们表明,我们的方法引领着取决于任务分布层面的界限。特别是在任务分布位于低维的方方面的情况下,我们扩大我们的分析范围,为这种结构使用维度减少技术和核算,获得比以往工作要好得多的界限,这完全取决于国家和行动的数量。我们的方法的关键是,在不断升级的REDL 模型估算时,我们的方法的正规化意味着,我们在不断升级的RADL 方法中,我们用的是常规的方法。

0

相关内容

Learning

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Brg1正调控STAT1/PUMA通路介导的细胞凋亡在肝缺血再灌注损伤中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

Ca2+/PKC通路在PFOS诱导的小胶质细胞炎性活化中的意义

国家自然科学基金

0+阅读 · 2015年12月31日

基于系统仿真模拟的生态型灌区量化评价研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于A-Train卫星观测的沙尘暴数字重构技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

乙烯反应转录因子OsERF2调控水稻根发育的分子基础

国家自然科学基金

0+阅读 · 2011年12月31日

DNA损伤下调NDRG1蛋白并诱导PKCd活化的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

业务驱动的IT服务故障诊断与影响分析模型

国家自然科学基金

0+阅读 · 2009年12月31日

大气污染化学事故风险控制的优化理论与应用

国家自然科学基金

0+阅读 · 2009年12月31日

Keap1-Nrf2-ARE信号通路在花色苷诱导HO-1mRNA表达及抗氧化损伤中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Low Emission Building Control with Zero-Shot Reinforcement Learning

Low Emission Building Control with Zero-Shot Reinforcement Learning

Arxiv

0+阅读 · 2022年8月12日

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

Arxiv

0+阅读 · 2022年8月11日

Channel Estimation based on Gaussian Mixture Models with Structured Covariances

Arxiv

0+阅读 · 2022年8月11日

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

Arxiv

0+阅读 · 2022年8月11日

A Modular Framework for Reinforcement Learning Optimal Execution

Arxiv

0+阅读 · 2022年8月11日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Meta-Transfer Learning for Zero-Shot Super-Resolution

Meta-Transfer Learning for Zero-Shot Super-Resolution

Arxiv

43+阅读 · 2020年2月27日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

2025生成式AI企业应用实务报告

【普林斯顿博士论文】移动计算摄影中的神经场表示

【ICML2025】SADA：稳定性引导的自适应扩散加速

LLMOps：大语言模型的生产环境管理

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Low Emission Building Control with Zero-Shot Reinforcement Learning

Low Emission Building Control with Zero-Shot Reinforcement Learning

Arxiv

0+阅读 · 2022年8月12日

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

Arxiv

0+阅读 · 2022年8月11日

Channel Estimation based on Gaussian Mixture Models with Structured Covariances

Arxiv

0+阅读 · 2022年8月11日

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

Arxiv

0+阅读 · 2022年8月11日

A Modular Framework for Reinforcement Learning Optimal Execution

Arxiv

0+阅读 · 2022年8月11日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Meta-Transfer Learning for Zero-Shot Super-Resolution

Meta-Transfer Learning for Zero-Shot Super-Resolution

Arxiv

43+阅读 · 2020年2月27日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

相关基金

Brg1正调控STAT1/PUMA通路介导的细胞凋亡在肝缺血再灌注损伤中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

Ca2+/PKC通路在PFOS诱导的小胶质细胞炎性活化中的意义

国家自然科学基金

0+阅读 · 2015年12月31日

基于系统仿真模拟的生态型灌区量化评价研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于A-Train卫星观测的沙尘暴数字重构技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

乙烯反应转录因子OsERF2调控水稻根发育的分子基础

国家自然科学基金

0+阅读 · 2011年12月31日

DNA损伤下调NDRG1蛋白并诱导PKCd活化的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

业务驱动的IT服务故障诊断与影响分析模型

国家自然科学基金

0+阅读 · 2009年12月31日

大气污染化学事故风险控制的优化理论与应用

国家自然科学基金

0+阅读 · 2009年12月31日

Keap1-Nrf2-ARE信号通路在花色苷诱导HO-1mRNA表达及抗氧化损伤中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员