由随机网络蒸馏公司进行反勘探 (Anti-Exploration by Random Network Distillation) - 专知论文

会员服务 ·

0

Networking · 蒸馏 · 判别器 · Performer · 估计/估计量 ·

2023 年 1 月 31 日

Anti-Exploration by Random Network Distillation

翻译：由随机网络蒸馏公司进行反勘探

Alexander Nikulin,Vladislav Kurenkov,Denis Tarasov,Sergey Kolesnikov

from arxiv, Source code: https://github.com/tinkoff-ai/sac-rnd

Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.

翻译：尽管随机网络蒸馏(RND)在不同领域取得了成功,但事实证明,它不够具有歧视性,不足以用作一种不确定性的估测器,用以惩罚非在线强化学习中的分配外行动。在本文中,我们重新审视了这些结果,并表明,由于对RND之前的附加条件做出了天真的选择,行为者无法有效地尽量减少反勘探奖金,而歧视也不是一个问题。我们表明,如果以精致的线性移动(FILM)为条件,从而形成一种以Soft Allor-Critic为基础的简单而高效的无共通算法,这一限制是可以避免的。我们用D4RL基准来评估它,表明它能够达到与共用方法相类似的性,并且能够以宽广的幅度实现超效的无共用方法。

0

相关内容

Networking

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

掺杂离子再分布调控Whitlockite型荧光粉发光性质的研究

国家自然科学基金

0+阅读 · 2015年12月31日

受Mittag-Lef？er噪声激励的广义朗之万方程的随机共振研究

国家自然科学基金

0+阅读 · 2015年12月31日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

Mcl-1 蛋白稳定性的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

新泛素化修饰因子对Hedgehog信号通路调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Lnc-RP11-611D20.2调控Notch1影响肿瘤细胞增殖在急性淋巴细胞白血病中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于金刚石“氮-空位”中心的腔量子电动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

Tudor-SN蛋白调控细胞周期的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

Dense Network Expansion for Class Incremental Learning

Dense Network Expansion for Class Incremental Learning

Arxiv

1+阅读 · 2023年3月22日

Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction

Arxiv

0+阅读 · 2023年3月22日

Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL

Arxiv

0+阅读 · 2023年3月21日

A closer look at the training dynamics of knowledge distillation

Arxiv

0+阅读 · 2023年3月20日

Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-Identification

Arxiv

0+阅读 · 2023年3月20日

Zero-shot Transferable and Persistently Feasible Safe Control for High Dimensional Systems by Consistent Abstraction

Arxiv

0+阅读 · 2023年3月17日

TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

Arxiv

0+阅读 · 2023年3月17日

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

Arxiv

10+阅读 · 2021年10月4日

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Arxiv

14+阅读 · 2021年2月16日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Dense Network Expansion for Class Incremental Learning

Dense Network Expansion for Class Incremental Learning

Arxiv

1+阅读 · 2023年3月22日

Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction

Arxiv

0+阅读 · 2023年3月22日

Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL

Arxiv

0+阅读 · 2023年3月21日

A closer look at the training dynamics of knowledge distillation

Arxiv

0+阅读 · 2023年3月20日

Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-Identification

Arxiv

0+阅读 · 2023年3月20日

Zero-shot Transferable and Persistently Feasible Safe Control for High Dimensional Systems by Consistent Abstraction

Arxiv

0+阅读 · 2023年3月17日

TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

Arxiv

0+阅读 · 2023年3月17日

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

Arxiv

10+阅读 · 2021年10月4日

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Arxiv

14+阅读 · 2021年2月16日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

相关基金

掺杂离子再分布调控Whitlockite型荧光粉发光性质的研究

国家自然科学基金

0+阅读 · 2015年12月31日

受Mittag-Lef？er噪声激励的广义朗之万方程的随机共振研究

国家自然科学基金

0+阅读 · 2015年12月31日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

Mcl-1 蛋白稳定性的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

新泛素化修饰因子对Hedgehog信号通路调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Lnc-RP11-611D20.2调控Notch1影响肿瘤细胞增殖在急性淋巴细胞白血病中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于金刚石“氮-空位”中心的腔量子电动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

Tudor-SN蛋白调控细胞周期的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员