某些要求的监督:将甲骨文政策纳入通过不确定因素计量器加强学习 (Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics) - 专知论文

会员服务 ·

0

Learning · 估计/估计量 · SAC · Oracle · 强化学习 ·

2022 年 8 月 22 日

Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

翻译：某些要求的监督:将甲骨文政策纳入通过不确定因素计量器加强学习

Jun Jet Tai,Jordan K. Terry,Mauro S. Innocente,James Brusey,Nadjim Horri

from arxiv, Under review at AAAI23

An inherent problem in reinforcement learning is coping with policies that are uncertain about what action to take (or the value of a state). Model uncertainty, more formally known as epistemic uncertainty, refers to the expected prediction error of a model beyond the sampling noise. In this paper, we propose a metric for epistemic uncertainty estimation in Q-value functions, which we term pathwise epistemic uncertainty. We further develop a method to compute its approximate upper bound, which we call F -value. We experimentally apply the latter to Deep Q-Networks (DQN) and show that uncertainty estimation in reinforcement learning serves as a useful indication of learning progress. We then propose a new approach to improving sample efficiency in actor-critic algorithms by learning from an existing (previously learned or hard-coded) oracle policy while uncertainty is high, aiming to avoid unproductive random actions during training. We term this Critic Confidence Guided Exploration (CCGE). We implement CCGE on Soft Actor-Critic (SAC) using our F-value metric, which we apply to a handful of popular Gym environments and show that it achieves better sample efficiency and total episodic reward than vanilla SAC in limited contexts.

翻译：强化学习的一个固有问题是,在强化学习中,模型不确定性(更正式称为缩写不确定性)是指抽样噪音之外模型的预期预测错误。在本文中,我们建议了Q值函数中的缩写不确定性估算指标,我们用它来定义路径上表的不确定性。我们进一步开发了一种方法来计算其近乎的上层约束值,我们称之为F - 值。我们实验性地将后者应用到深QNetworks(DQN),并表明,在强化学习中,不确定性的估算是学习进展的有益标志。我们然后提出一种新的方法,通过学习现有的(以前学过或硬编码的)或缩略语政策,提高演员-批评算法的抽样效率,在不确定性高的同时,目的是避免非生产性的随机行动。我们称之为Crit 信任指导探索(CCGE) 。我们用我们的F值指标将“软动作-CAC”(SAC)应用到少数流行的GymAC环境,并显示它能更好实现抽样效率。

0

相关内容

Learning

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

野油菜黄单胞菌群体感应信号DSF生物合成途径和机理研究

国家自然科学基金

1+阅读 · 2014年12月31日

Li+/Li电对电化学氧化还原动力学行为研究

国家自然科学基金

0+阅读 · 2014年12月31日

3'-甲氧基葛根素生物合成途径中关键甲基转移酶基因的克隆与功能分析

国家自然科学基金

0+阅读 · 2013年12月31日

随机跳变-切换系统有限时间补偿控制

国家自然科学基金

0+阅读 · 2013年12月31日

亚砷酸钠对血管内皮祖细胞修复能力的效应及其分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

磷脂酰甘油和心磷脂生物合成的结构基础

国家自然科学基金

0+阅读 · 2011年12月31日

离子通道TRPM2在血管壁内膜增生中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

P2X7受体在细胞衰老与凋亡中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

葫芦素E通过STAT3信号途径抑制肿瘤血管新生的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Learning Algorithms for Intelligent Agents and Mechanisms

Arxiv

0+阅读 · 2022年10月6日

Query The Agent: Improving sample efficiency through epistemic uncertainty estimation

Arxiv

0+阅读 · 2022年10月5日

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年10月4日

Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks

Arxiv

0+阅读 · 2022年10月4日

A general framework for probabilistic sensitivity analysis with respect to distribution parameters

Arxiv

0+阅读 · 2022年10月3日

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2022年10月3日

Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality

Arxiv

0+阅读 · 2022年10月3日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

Arxiv

0+阅读 · 2022年9月30日

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Arxiv

15+阅读 · 2020年4月3日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Learning Algorithms for Intelligent Agents and Mechanisms

Arxiv

0+阅读 · 2022年10月6日

Query The Agent: Improving sample efficiency through epistemic uncertainty estimation

Arxiv

0+阅读 · 2022年10月5日

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年10月4日

Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks

Arxiv

0+阅读 · 2022年10月4日

A general framework for probabilistic sensitivity analysis with respect to distribution parameters

Arxiv

0+阅读 · 2022年10月3日

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2022年10月3日

Bayesian Inference using the Proximal Mapping: Uncertainty Quantification under Varying Dimensionality

Arxiv

0+阅读 · 2022年10月3日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

Arxiv

0+阅读 · 2022年9月30日

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Arxiv

15+阅读 · 2020年4月3日

相关基金

野油菜黄单胞菌群体感应信号DSF生物合成途径和机理研究

国家自然科学基金

1+阅读 · 2014年12月31日

Li+/Li电对电化学氧化还原动力学行为研究

国家自然科学基金

0+阅读 · 2014年12月31日

3'-甲氧基葛根素生物合成途径中关键甲基转移酶基因的克隆与功能分析

国家自然科学基金

0+阅读 · 2013年12月31日

随机跳变-切换系统有限时间补偿控制

国家自然科学基金

0+阅读 · 2013年12月31日

亚砷酸钠对血管内皮祖细胞修复能力的效应及其分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

磷脂酰甘油和心磷脂生物合成的结构基础

国家自然科学基金

0+阅读 · 2011年12月31日

离子通道TRPM2在血管壁内膜增生中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

P2X7受体在细胞衰老与凋亡中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

葫芦素E通过STAT3信号途径抑制肿瘤血管新生的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员