价值函数中基于模型的不确定性</s> (Model-Based Uncertainty in Value Functions) - 专知论文

会员服务 ·

0

方差 · 价值函数 · 泛函 · Learning · Continuity ·

2023 年 3 月 7 日

Model-Based Uncertainty in Value Functions

翻译：价值函数中基于模型的不确定性

Carlos E. Luis,Alessandro G. Bottero,Julia Vinogradska,Felix Berkenkamp,Jan Peters

from arxiv, AISTATS 2023

We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency.

翻译：我们考虑了在基于模型的强化学习中将预期累积回报的不确定性量化的问题。特别是,我们侧重于确定由MDP分布导致的数值差异的特征。以前的工作通过解决所谓的不确定性Bellman方程式,将后方差异与数值的比值上下限,但过度赞同可能导致低效率的勘探。我们提出了一个新的不确定性Bellman方程式,其解决方案与真正的后继价值差异相匹配,并明确了以往工作中的差距特征。此外,我们的不确定性量化技术很容易通过使用标准的深度强化学习结构,自然地融入到表格设置之外的共同勘探战略和尺度中。在表格和连续控制环境中的艰难勘探任务实验表明,我们更清晰的不确定性估计提高了样本效率。</s>

0

相关内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

大白菜KIN基因的表达及其pre-mRNA加工机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

听力基因prestin在回声定位哺乳动物中的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

电针"足三里"对脊髓损伤结肠动力障碍大鼠per2基因生物钟效应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Riemann-Hilbert 方法和随机矩阵谱分析中的 Painleve 渐近

国家自然科学基金

0+阅读 · 2012年12月31日

基于损失厌恶的供应链决策行为与管理策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于胸腺恢复的AIDS患者T淋巴细胞受体库多样性研究

国家自然科学基金

1+阅读 · 2012年12月31日

城市绿地重金属与典型除草剂复合污染生态风险评估研究

国家自然科学基金

0+阅读 · 2012年12月31日

供应链突发风险“情景-应对”型应急决策研究

国家自然科学基金

1+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Packed-Ensembles for Efficient Uncertainty Estimation

Arxiv

0+阅读 · 2023年4月27日

Fairness Uncertainty Quantification: How certain are you that the model is fair?

Arxiv

0+阅读 · 2023年4月27日

Convergence of uncertainty estimates in Ensemble and Bayesian sparse model discovery

Arxiv

0+阅读 · 2023年4月26日

Federated Learning with Uncertainty-Based Client Clustering for Fleet-Wide Fault Diagnosis

Arxiv

0+阅读 · 2023年4月26日

Targeted Adaptive Design

Arxiv

0+阅读 · 2023年4月25日

Efficient Utility Function Learning for Multi-Objective Parameter Optimization with Prior Knowledge

Arxiv

0+阅读 · 2023年4月25日

A Survey of Uncertainty in Deep Neural Networks

Arxiv

30+阅读 · 2021年7月7日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Packed-Ensembles for Efficient Uncertainty Estimation

Arxiv

0+阅读 · 2023年4月27日

Fairness Uncertainty Quantification: How certain are you that the model is fair?

Arxiv

0+阅读 · 2023年4月27日

Convergence of uncertainty estimates in Ensemble and Bayesian sparse model discovery

Arxiv

0+阅读 · 2023年4月26日

Federated Learning with Uncertainty-Based Client Clustering for Fleet-Wide Fault Diagnosis

Arxiv

0+阅读 · 2023年4月26日

Targeted Adaptive Design

Arxiv

0+阅读 · 2023年4月25日

Efficient Utility Function Learning for Multi-Objective Parameter Optimization with Prior Knowledge

Arxiv

0+阅读 · 2023年4月25日

A Survey of Uncertainty in Deep Neural Networks

Arxiv

30+阅读 · 2021年7月7日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

大白菜KIN基因的表达及其pre-mRNA加工机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

听力基因prestin在回声定位哺乳动物中的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

电针"足三里"对脊髓损伤结肠动力障碍大鼠per2基因生物钟效应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Riemann-Hilbert 方法和随机矩阵谱分析中的 Painleve 渐近

国家自然科学基金

0+阅读 · 2012年12月31日

基于损失厌恶的供应链决策行为与管理策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于胸腺恢复的AIDS患者T淋巴细胞受体库多样性研究

国家自然科学基金

1+阅读 · 2012年12月31日

城市绿地重金属与典型除草剂复合污染生态风险评估研究

国家自然科学基金

0+阅读 · 2012年12月31日

供应链突发风险“情景-应对”型应急决策研究

国家自然科学基金

1+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员