模型 -- -- 价值不一致,作为突发性不确定性的信号 (Model-Value Inconsistency as a Signal for Epistemic Uncertainty) - 专知论文

会员服务 ·

0

估计/估计量 · 价值函数 · 泛函 · MoDELS · 集成 ·

2021 年 12 月 8 日

Model-Value Inconsistency as a Signal for Epistemic Uncertainty

翻译：模型 -- -- 价值不一致,作为突发性不确定性的信号

Angelos Filos,Eszter Vértes,Zita Marinho,Gregory Farquhar,Diana Borsa,Abram Friesen,Feryal Behbahani,Tom Schaul,André Barreto,Simon Osindero

from arxiv, The first three authors contributed equally

Using a model of the environment and a value function, an agent can construct many estimates of a state's value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an \emph{implicit value ensemble} (IVE). Consequently, the discrepancy between these estimates can be used as a proxy for the agent's epistemic uncertainty; we term this signal \emph{model-value inconsistency} or \emph{self-inconsistency} for short. Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms. We provide empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful (i) as a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a model.

翻译：使用环境和值函数模型, 代理商可以构建国家值的许多估计值, 方法是为不同长度的模型打开滚动, 并使用其值函数。我们的关键洞察力是, 可以将这组值估计值视为一种共合物, 我们称之为共合物。因此, 这些估计值之间的差异可以用作该代理商特征不确定性的替代物; 我们将这个信号 \ emph{ 模型值不一致} 或\ emph{ 自我不一致} 称为短信息。与先前通过培训许多模型和/或价值函数来估计不确定性的工作不同, 这种方法仅需要单一的模型和价值函数, 已经在大多数基于模型的强化学习算法中学习过。我们提供了列表和功能近似环境上的经验证据, 即自相矛盾( i) 作为勘探的信号, (ii) 在分销转移下安全地采取行动, 以及 (iii) 以模型为坚实的基于价值的规划。

0

相关内容

估计/估计量

估计/估计量

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

专知会员服务

45+阅读 · 2020年8月18日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

已删除

将门创投

3+阅读 · 2018年10月11日

Linear Regression, Covariate Selection and the Failure of Modelling

Arxiv

0+阅读 · 2022年2月11日

PCENet: High Dimensional Surrogate Modeling for Learning Uncertainty

Arxiv

0+阅读 · 2022年2月11日

A Functional Operator for Model Uncertainty Quantification in the RKHS

A Functional Operator for Model Uncertainty Quantification in the RKHS

Arxiv

0+阅读 · 2022年2月10日

Learning more skills through optimistic exploration

Arxiv

0+阅读 · 2022年2月8日

Data Consistency for Weakly Supervised Learning

Arxiv

0+阅读 · 2022年2月8日

Uncertainty Modeling for Out-of-Distribution Generalization

Arxiv

9+阅读 · 2022年2月8日

Optimal pricing for electricity retailers based on data-driven consumers' price-response

Arxiv

0+阅读 · 2022年2月8日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Arxiv

15+阅读 · 2020年4月3日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

专知会员服务

45+阅读 · 2020年8月18日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】面向时间序列基础模型的合成序列符号数据生成方法

军事通信市场七大趋势概述

【CMU博士论文】深度学习中泛化的量化、理解与改进

面向低光照图像增强的扩散模型

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

已删除

将门创投

3+阅读 · 2018年10月11日

相关论文

Linear Regression, Covariate Selection and the Failure of Modelling

Arxiv

0+阅读 · 2022年2月11日

PCENet: High Dimensional Surrogate Modeling for Learning Uncertainty

Arxiv

0+阅读 · 2022年2月11日

A Functional Operator for Model Uncertainty Quantification in the RKHS

A Functional Operator for Model Uncertainty Quantification in the RKHS

Arxiv

0+阅读 · 2022年2月10日

Learning more skills through optimistic exploration

Arxiv

0+阅读 · 2022年2月8日

Data Consistency for Weakly Supervised Learning

Arxiv

0+阅读 · 2022年2月8日

Uncertainty Modeling for Out-of-Distribution Generalization

Arxiv

9+阅读 · 2022年2月8日

Optimal pricing for electricity retailers based on data-driven consumers' price-response

Arxiv

0+阅读 · 2022年2月8日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Arxiv

15+阅读 · 2020年4月3日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

微信扫码咨询专知VIP会员