从Void中产生的行为:未受监督的主动参加培训前行为 (Behavior From the Void: Unsupervised Active Pre-Training) - 专知论文

会员服务 ·

0

Performer · 无监督 · 回合 · Better · Atari ·

2021 年 3 月 8 日

Behavior From the Void: Unsupervised Active Pre-Training

翻译：从Void中产生的行为:未受监督的主动参加培训前行为

Liu Hao,Abbeel Pieter

We introduce a new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training. APT learns behaviors and representations by actively searching for novel states in reward-free environments. The key novel idea is to explore the environment by maximizing a non-parametric entropy computed in an abstract representation space, which avoids the challenging density modeling and consequently allows our approach to scale much better in environments that have high-dimensional observations (e.g., image observations). We empirically evaluate APT by exposing task-specific reward after a long unsupervised pre-training phase. On Atari games, APT achieves human-level performance on 12 games and obtains highly competitive performance compared to canonical fully supervised RL algorithms. On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult to train from scratch.

翻译：我们引入了一个新的强化学习培训前不受监督的新方法,称为APT, 即积极培训前, APT通过积极寻找无报酬环境中的新国家来学习行为和表现。关键的新想法是探索环境,在抽象的表达空间中最大限度地使用非参数性昆虫,从而避免具有挑战性的密度模型,从而使我们能够在具有高层次观测(例如图像观察)的环境中进行更好的规模评估。我们从经验上评价APT, 方法是在长期未经监督的培训前阶段之后暴露具体任务的报酬。在Atari游戏中,APT在12场比赛中取得了人的水平表现,并获得了与完全受监督的RL算法相比具有高度竞争力的性能。在DM Control套中,APT在无症状性表现和数据效率方面超越了所有基线,并极大地改进了从零开始培训极为困难的任务的绩效。

0

相关内容

Performer

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

专知会员服务

71+阅读 · 2020年4月20日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

专知会员服务

48+阅读 · 2019年6月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Imitating Targets from all sides: An Unsupervised Transfer Learning method for Person Re-identification

Arxiv

0+阅读 · 2021年4月28日

Return-Based Contrastive Representation Learning for Reinforcement Learning

Arxiv

10+阅读 · 2021年2月22日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Arxiv

17+阅读 · 2020年4月28日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Continual Unsupervised Representation Learning

Continual Unsupervised Representation Learning

Arxiv

7+阅读 · 2019年10月31日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Arxiv

5+阅读 · 2018年7月21日

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Arxiv

6+阅读 · 2018年6月7日

Learning Unsupervised Learning Rules

Arxiv

7+阅读 · 2018年5月23日

VIP会员

文章信息

相关主题

相关VIP内容

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

专知会员服务

71+阅读 · 2020年4月20日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

专知会员服务

48+阅读 · 2019年6月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Imitating Targets from all sides: An Unsupervised Transfer Learning method for Person Re-identification

Arxiv

0+阅读 · 2021年4月28日

Return-Based Contrastive Representation Learning for Reinforcement Learning

Arxiv

10+阅读 · 2021年2月22日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Arxiv

17+阅读 · 2020年4月28日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Continual Unsupervised Representation Learning

Continual Unsupervised Representation Learning

Arxiv

7+阅读 · 2019年10月31日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Arxiv

5+阅读 · 2018年7月21日

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Arxiv

6+阅读 · 2018年6月7日

Learning Unsupervised Learning Rules

Arxiv

7+阅读 · 2018年5月23日

微信扫码咨询专知VIP会员