用于国家覆盖和实现目标的递增无监督技能发现 (Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching) - 专知论文

会员服务 ·

0

有向 · 回合 · INFORMS · 互信息 · 学成 ·

2021 年 10 月 27 日

Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching

翻译：用于国家覆盖和实现目标的递增无监督技能发现

Pierre-Alexandre Kamienny,Jean Tarbouriech,Alessandro Lazaric,Ludovic Denoyer

Learning meaningful behaviors in the absence of reward is a difficult problem in reinforcement learning. A desirable and challenging unsupervised objective is to learn a set of diverse skills that provide a thorough coverage of the state space while being directed, i.e., reliably reaching distinct regions of the environment. In this paper, we build on the mutual information framework for skill discovery and introduce UPSIDE, which addresses the coverage-directedness trade-off in the following ways: 1) We design policies with a decoupled structure of a directed skill, trained to reach a specific region, followed by a diffusing part that induces a local coverage. 2) We optimize policies by maximizing their number under the constraint that each of them reaches distinct regions of the environment (i.e., they are sufficiently discriminable) and prove that this serves as a lower bound to the original mutual information objective. 3) Finally, we compose the learned directed skills into a growing tree that adaptively covers the environment. We illustrate in several navigation and control environments how the skills learned by UPSIDE solve sparse-reward downstream tasks better than existing baselines.

翻译：在缺乏奖励的情况下学习有意义的行为是强化学习的一个困难问题。一个可取和具有挑战性且不受监督的目标,是学习一套全面覆盖国家空间的多种技能,同时指导这些技能,即可靠地到达环境的不同区域。在本文件中,我们以技能发现方面的相互信息框架为基础,并采用UPSIDE, 以下列方式处理覆盖和定向交易:1) 我们设计政策时采用一种分解的定向技能结构,受过培训,能够到达特定区域,然后传播部分,引起局部覆盖。2) 我们优化政策,在限制下,最大限度地利用其数量,使每个区域都到达环境的不同区域(即它们足够分散),并证明这与最初的相互信息目标的联系较小。3)最后,我们把学到的定向技能纳入一个不断增长的、适应性地覆盖环境的树中。我们在若干导航和控制环境中说明,UPSIDE所学的技能如何比现有基线更好地解决分散的下游任务。

0

相关内容

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Learning to Reach, Swim, Walk and Fly in One Trial: Data-Driven Control with Scarce Data and Side Information

Arxiv

0+阅读 · 2021年12月28日

ArT: All-round Thinker for Unsupervised Commonsense Question-Answering

Arxiv

0+阅读 · 2021年12月26日

Attentive WaveBlock: Complementarity-enhanced Mutual Networks for Unsupervised Domain Adaptation in Person Re-identification and Beyond

Arxiv

0+阅读 · 2021年12月26日

Unsupervised Clustering Active Learning for Person Re-identification

Arxiv

0+阅读 · 2021年12月26日

ToAlign: Task-oriented Alignment for Unsupervised Domain Adaptation

Arxiv

7+阅读 · 2021年10月26日

Contrastive Active Inference

Arxiv

4+阅读 · 2021年10月19日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Intra-Inter Camera Similarity for Unsupervised Person Re-Identification

Arxiv

3+阅读 · 2021年3月22日

Omni-directional Feature Learning for Person Re-identification

Omni-directional Feature Learning for Person Re-identification

Arxiv

3+阅读 · 2018年12月13日

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Arxiv

6+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

相关VIP内容

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

Learning to Reach, Swim, Walk and Fly in One Trial: Data-Driven Control with Scarce Data and Side Information

Arxiv

0+阅读 · 2021年12月28日

ArT: All-round Thinker for Unsupervised Commonsense Question-Answering

Arxiv

0+阅读 · 2021年12月26日

Attentive WaveBlock: Complementarity-enhanced Mutual Networks for Unsupervised Domain Adaptation in Person Re-identification and Beyond

Arxiv

0+阅读 · 2021年12月26日

Unsupervised Clustering Active Learning for Person Re-identification

Arxiv

0+阅读 · 2021年12月26日

ToAlign: Task-oriented Alignment for Unsupervised Domain Adaptation

Arxiv

7+阅读 · 2021年10月26日

Contrastive Active Inference

Arxiv

4+阅读 · 2021年10月19日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Intra-Inter Camera Similarity for Unsupervised Person Re-Identification

Arxiv

3+阅读 · 2021年3月22日

Omni-directional Feature Learning for Person Re-identification

Omni-directional Feature Learning for Person Re-identification

Arxiv

3+阅读 · 2018年12月13日

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Arxiv

6+阅读 · 2018年3月27日

微信扫码咨询专知VIP会员