通过回报外推法进行遗传吸收学习 (Genetic Imitation Learning by Reward Extrapolation) - 专知论文

会员服务 ·

0

Learning · Performer · 随机性策略 · Integration · 估计/估计量 ·

2023 年 1 月 3 日

Genetic Imitation Learning by Reward Extrapolation

翻译：通过回报外推法进行遗传吸收学习

Boyuan Zheng,Jianlong Zhou,Fang Chen

Imitation learning demonstrates remarkable performance in various domains. However, imitation learning is also constrained by many prerequisites. The research community has done intensive research to alleviate these constraints, such as adding the stochastic policy to avoid unseen states, eliminating the need for action labels, and learning from the suboptimal demonstrations. Inspired by the natural reproduction process, we proposed a method called GenIL that integrates the Genetic Algorithm with imitation learning. The involvement of the Genetic Algorithm improves the data efficiency by reproducing trajectories with various returns and assists the model in estimating more accurate and compact reward function parameters. We tested GenIL in both Atari and Mujoco domains, and the result shows that it successfully outperforms the previous extrapolation methods over extrapolation accuracy, robustness, and overall policy performance when input data is limited.

翻译：模拟学习显示了各个领域的杰出表现。但是,模仿学习也受到许多先决条件的限制。研究界已经为缓解这些制约因素进行了密集研究,例如增加随机政策以避免隐蔽状态,消除行动标签的需要,以及从次优的演示中学习。在自然复制过程的启发下,我们提出了一个名为GenIL的方法,将遗传地算法与模仿学习结合起来。遗传地算法的介入通过产生具有各种回报的轨迹来提高数据效率,并协助模型估算更准确和紧凑的奖励功能参数。我们在Atari和Mujoco领域测试了GenIL,结果显示,在投入数据有限的情况下,它成功地超越了先前的外推法的准确性、稳健性和总体政策性能。

0

相关内容

Learning

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

磁流体动力学方程的有限元法及多重网格算法

国家自然科学基金

0+阅读 · 2015年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于AOD-FLIM/CARS多模光学平台监测单个活细胞内RNA合成过程

国家自然科学基金

0+阅读 · 2013年12月31日

激酶Pak2调控Caspase-3抑制细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

半夏泻心汤调节2型糖尿病人GLP-1和β细胞功能的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU的directionlets域SAR图像相干斑噪声抑制并行算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

组团参加国际光学联合会大会

国家自然科学基金

0+阅读 · 2012年8月18日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

葡萄糖敏感型嵌段共聚物胶束用于胰岛素的控制释放

国家自然科学基金

0+阅读 · 2009年12月31日

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

Arxiv

0+阅读 · 2023年3月14日

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

Arxiv

0+阅读 · 2023年3月12日

Nonparametric Simulation Extrapolation for Measurement Error Models

Arxiv

0+阅读 · 2023年3月10日

Distributed and Deep Vertical Federated Learning with Big Data

Arxiv

0+阅读 · 2023年3月10日

Resolving quantitative MRI model degeneracy with machine learning via training data distribution design

Resolving quantitative MRI model degeneracy with machine learning via training data distribution design

Arxiv

0+阅读 · 2023年3月9日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

35+阅读 · 2021年8月2日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Arxiv

12+阅读 · 2021年2月21日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

随机性策略

估计/估计量

相关VIP内容

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

Arxiv

0+阅读 · 2023年3月14日

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

Arxiv

0+阅读 · 2023年3月12日

Nonparametric Simulation Extrapolation for Measurement Error Models

Arxiv

0+阅读 · 2023年3月10日

Distributed and Deep Vertical Federated Learning with Big Data

Arxiv

0+阅读 · 2023年3月10日

Resolving quantitative MRI model degeneracy with machine learning via training data distribution design

Resolving quantitative MRI model degeneracy with machine learning via training data distribution design

Arxiv

0+阅读 · 2023年3月9日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

35+阅读 · 2021年8月2日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Arxiv

12+阅读 · 2021年2月21日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

磁流体动力学方程的有限元法及多重网格算法

国家自然科学基金

0+阅读 · 2015年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于AOD-FLIM/CARS多模光学平台监测单个活细胞内RNA合成过程

国家自然科学基金

0+阅读 · 2013年12月31日

激酶Pak2调控Caspase-3抑制细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

半夏泻心汤调节2型糖尿病人GLP-1和β细胞功能的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU的directionlets域SAR图像相干斑噪声抑制并行算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

组团参加国际光学联合会大会

国家自然科学基金

0+阅读 · 2012年8月18日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

葡萄糖敏感型嵌段共聚物胶束用于胰岛素的控制释放

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员