特工什么时候应该探险? (When should agents explore?) - 专知论文

会员服务 ·

0

contrastive · Atari · 稳健性 · 多样性 · 强化学习 ·

2021 年 8 月 26 日

When should agents explore?

翻译：特工什么时候应该探险?

Miruna Pîslar,David Szepesvari,Georg Ostrovski,Diana Borsa,Tom Schaul

Exploration remains a central challenge for reinforcement learning (RL). Virtually all existing methods share the feature of a monolithic behaviour policy that changes only gradually (at best). In contrast, the exploratory behaviours of animals and humans exhibit a rich diversity, namely including forms of switching between modes. This paper presents an initial study of mode-switching, non-monolithic exploration for RL. We investigate different modes to switch between, at what timescales it makes sense to switch, and what signals make for good switching triggers. We also propose practical algorithmic components that make the switching mechanism adaptive and robust, which enables flexibility without an accompanying hyper-parameter-tuning burden. Finally, we report a promising and detailed analysis on Atari, using two-mode exploration and switching at sub-episodic time-scales.

翻译：探索仍然是强化学习的一个中心挑战(RL)。几乎所有现有方法都具有单一行为政策的特点,这种单一行为政策只是逐渐(最佳地)改变。相反,动物和人类的探索行为表现出丰富的多样性,包括不同模式之间的转换形式。本文介绍了对模式转换和非单方体的探索进行的初步研究。我们调查不同模式,以在切换的时标和良好切换触发信号的信号之间转换。我们还提出了实用的算法要素,使转换机制适应性和稳健,从而能够在没有超参数调整负担的情况下实现灵活性。最后,我们报告对Atari进行了有希望和详细的分析,在亚方形时间尺度上使用两种模式的探索和转换。

0

相关内容

contrastive

【KDD2021】图神经网络，NUS- Xavier Bresson教授

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

66+阅读 · 2021年8月20日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

【德勤】数字化健康白皮书

专知会员服务

48+阅读 · 2020年12月4日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Towards Social Situation Awareness in Support Agents

Arxiv

0+阅读 · 2021年10月20日

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Arxiv

0+阅读 · 2021年10月19日

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

Arxiv

0+阅读 · 2021年10月17日

Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework

Arxiv

0+阅读 · 2021年10月14日

Influence-Based Reinforcement Learning for Intrinsically-Motivated Agents

Arxiv

0+阅读 · 2021年10月13日

Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

Arxiv

21+阅读 · 2019年2月4日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

Arxiv

3+阅读 · 2018年9月4日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

【KDD2021】图神经网络，NUS- Xavier Bresson教授

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

66+阅读 · 2021年8月20日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

【德勤】数字化健康白皮书

专知会员服务

48+阅读 · 2020年12月4日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Towards Social Situation Awareness in Support Agents

Arxiv

0+阅读 · 2021年10月20日

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Arxiv

0+阅读 · 2021年10月19日

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

Arxiv

0+阅读 · 2021年10月17日

Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework

Arxiv

0+阅读 · 2021年10月14日

Influence-Based Reinforcement Learning for Intrinsically-Motivated Agents

Arxiv

0+阅读 · 2021年10月13日

Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

Arxiv

21+阅读 · 2019年2月4日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

Arxiv

3+阅读 · 2018年9月4日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员