国家分配匹配非紧急加强学习方法 (A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning) - 专知论文

会员服务 ·

0

学成 · Continuity · 后向 · 回合 · 前向 ·

2022 年 5 月 11 日

A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning

翻译：国家分配匹配非紧急加强学习方法

Archit Sharma,Rehaan Ahmad,Chelsea Finn

While reinforcement learning (RL) provides a framework for learning through trial and error, translating RL algorithms into the real world has remained challenging. A major hurdle to real-world application arises from the development of algorithms in an episodic setting where the environment is reset after every trial, in contrast with the continual and non-episodic nature of the real-world encountered by embodied agents such as humans and robots. Prior works have considered an alternating approach where a forward policy learns to solve the task and the backward policy learns to reset the environment, but what initial state distribution should the backward policy reset the agent to? Assuming access to a few demonstrations, we propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations. This keeps the agent close to the task-relevant states, allowing for a mix of easy and difficult starting states for the forward policy. Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks from the EARL benchmark, with 40% gains on the hardest task, while making fewer assumptions than prior works.

翻译：虽然强化学习(RL)为通过试验和错误学习提供了一个框架,但将RL算法转化为现实世界仍然具有挑战性。现实世界应用的一个主要障碍来自在每次试验后环境被重新设置的偶发环境中发展算法,与人类和机器人等体现的代理人所遭遇的现实世界的持续和非偶然性质形成对比。先前的工程考虑了一种交替方法,即先期政策学会解决任务,后期政策学会重新设置环境,但后期政策应该重新设定代理人的最初状态分布是什么? 假设可以使用一些演示,我们提出一种新的方法,即MEDAL,用来培训后期政策以匹配所提供的演示中的国家分布。这使得代理人接近任务相关状态,从而可以混合各种容易和困难的起始状态来实施前期政策。我们的实验表明,MEDAL匹配或超越了前三次从 EARL 基准中确定的三个微弱的连续控制任务,在最困难的任务上取得了40%的收益,同时作出比以前少的假设。

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

整合素受体新型分子显像用于在体显示脑胶质瘤异质性的评价研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于多肽的诊断治疗一体化纳米材料的制备及其生物医学研究

国家自然科学基金

0+阅读 · 2014年12月31日

ADIPOR1基因变异在2型糖尿病合并冠心病中的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于含氟聚苯胺基纳米复合材料的室温高灵敏NO2气敏传感器研究

国家自然科学基金

0+阅读 · 2013年12月31日

控释VEGF/NT-3脊髓脱细胞支架在SCI模型中的血管化及神经再生研究

国家自然科学基金

0+阅读 · 2013年12月31日

碳纳米管纤维的连续制备和强韧化机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

玉米穗行数的遗传学基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型酞菁功能化介孔聚合物的合成与降解酚类污染物的研究

国家自然科学基金

0+阅读 · 2011年12月31日

树状大分子修饰纳米金微粒携带叶酸对人肺腺癌的CT靶向研究

国家自然科学基金

0+阅读 · 2009年12月31日

Risk Perspective Exploration in Distributional Reinforcement Learning

Arxiv

0+阅读 · 2022年7月1日

MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples Detectors

MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples Detectors

Arxiv

0+阅读 · 2022年6月30日

Lookback for Learning to Branch

Arxiv

0+阅读 · 2022年6月30日

Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach

Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach

Arxiv

0+阅读 · 2022年6月29日

Reinforcement learning based adaptive metaheuristics

Arxiv

0+阅读 · 2022年6月29日

Reinforcement Learning for Datacenter Congestion Control

Reinforcement Learning for Datacenter Congestion Control

Arxiv

0+阅读 · 2022年6月29日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

Matching Networks for One Shot Learning

Arxiv

10+阅读 · 2017年12月29日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】面向时间序列基础模型的合成序列符号数据生成方法

军事通信市场七大趋势概述

【CMU博士论文】深度学习中泛化的量化、理解与改进

面向低光照图像增强的扩散模型

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Risk Perspective Exploration in Distributional Reinforcement Learning

Arxiv

0+阅读 · 2022年7月1日

MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples Detectors

MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples Detectors

Arxiv

0+阅读 · 2022年6月30日

Lookback for Learning to Branch

Arxiv

0+阅读 · 2022年6月30日

Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach

Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach

Arxiv

0+阅读 · 2022年6月29日

Reinforcement learning based adaptive metaheuristics

Arxiv

0+阅读 · 2022年6月29日

Reinforcement Learning for Datacenter Congestion Control

Reinforcement Learning for Datacenter Congestion Control

Arxiv

0+阅读 · 2022年6月29日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

Matching Networks for One Shot Learning

Arxiv

10+阅读 · 2017年12月29日

相关基金

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

整合素受体新型分子显像用于在体显示脑胶质瘤异质性的评价研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于多肽的诊断治疗一体化纳米材料的制备及其生物医学研究

国家自然科学基金

0+阅读 · 2014年12月31日

ADIPOR1基因变异在2型糖尿病合并冠心病中的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于含氟聚苯胺基纳米复合材料的室温高灵敏NO2气敏传感器研究

国家自然科学基金

0+阅读 · 2013年12月31日

控释VEGF/NT-3脊髓脱细胞支架在SCI模型中的血管化及神经再生研究

国家自然科学基金

0+阅读 · 2013年12月31日

碳纳米管纤维的连续制备和强韧化机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

玉米穗行数的遗传学基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型酞菁功能化介孔聚合物的合成与降解酚类污染物的研究

国家自然科学基金

0+阅读 · 2011年12月31日

树状大分子修饰纳米金微粒携带叶酸对人肺腺癌的CT靶向研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员