反馈机制下贝叶斯马尔可夫决策过程的脱机风险感知策略选择方法 (An Offline Risk-aware Policy Selection Method for Bayesian Markov Decision Processes) - 专知论文

会员服务 ·

0

脱机 · 风险感知 · 马尔可夫决策过程 · 贝叶斯 · 反馈机制 ·

2023 年 4 月 11 日

An Offline Risk-aware Policy Selection Method for Bayesian Markov Decision Processes

翻译：反馈机制下贝叶斯马尔可夫决策过程的脱机风险感知策略选择方法

Giorgio Angelotti,Nicolas Drougard,Caroline Ponzoni Carvalho Chanel

from arxiv, Preprint, under review

In Offline Model Learning for Planning and in Offline Reinforcement Learning, the limited data set hinders the estimate of the Value function of the relative Markov Decision Process (MDP). Consequently, the performance of the obtained policy in the real world is bounded and possibly risky, especially when the deployment of a wrong policy can lead to catastrophic consequences. For this reason, several pathways are being followed with the scope of reducing the model error (or the distributional shift between the learned model and the true one) and, more broadly, obtaining risk-aware solutions with respect to model uncertainty. But when it comes to the final application which baseline should a practitioner choose? In an offline context where computational time is not an issue and robustness is the priority we propose Exploitation vs Caution (EvC), a paradigm that (1) elegantly incorporates model uncertainty abiding by the Bayesian formalism, and (2) selects the policy that maximizes a risk-aware objective over the Bayesian posterior between a fixed set of candidate policies provided, for instance, by the current baselines. We validate EvC with state-of-the-art approaches in different discrete, yet simple, environments offering a fair variety of MDP classes. In the tested scenarios EvC manages to select robust policies and hence stands out as a useful tool for practitioners that aim to apply offline planning and reinforcement learning solvers in the real world.

翻译：在计划的离线模型学习和强化学习场景中，有限的数据集制约了相关马尔可夫决策过程（MDP）价值函数的估计。因此，获得的策略在实际环境中的表现受到限制，尤其是当错误的策略部署可能导致灾难性后果时。出于这个原因，人们正在采取几种途径，以减少模型误差（或学习模型和真实模型之间的分布偏移），从而获得相对于模型不确定性风险感知的解决方案。但是，在最终应用中，从现有基线中选择哪个更好？在脱机计算时间不是问题而鲁棒性是重点的情况下，我们提出了利用开发 vs 小心（EvC）范式，它（1）优雅地结合了模型不确定性，遵守贝叶斯规范，（2）从提供的一组固定候选策略中选择策略，该候选策略由当前基线提供，使风险感知目标在贝叶斯后验之上最大化。我们在不同的离散，但简单的环境中验证了 EvC，提供了公平造型的马尔可夫决策过程（MDP）类的多样性。在测试的场景中，EvC能够选择稳健的策略，因此，对于旨在在现实世界中应用脱机规划和强化学习求解器的实践者来说，它是一个实用的工具。

0

相关内容

148页最新《深度强化学习》教程，148页ppt

148页最新《深度强化学习》教程，148页ppt

专知会员服务

77+阅读 · 2023年4月29日

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

专知会员服务

231+阅读 · 2022年4月10日

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

67+阅读 · 2022年3月29日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

基于供应链视角的环境治理：策略选择与协同机制

国家自然科学基金

0+阅读 · 2014年12月31日

焦虑情绪对社会决策行为的影响

国家自然科学基金

2+阅读 · 2013年12月31日

无界区域最优控制问题的无限元方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于不均衡支持向量机的小企业信用风险评价理论与模型

国家自然科学基金

0+阅读 · 2012年12月31日

非凸稀疏先验图像恢复建模理论和算法

国家自然科学基金

0+阅读 · 2012年12月31日

奇异摄动问题各向异性自适应有限元

国家自然科学基金

0+阅读 · 2012年12月31日

知识员工反生产行为对知识创新的影响效应与干预策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于行为决策理论的决策分析方法及其应用研究

国家自然科学基金

4+阅读 · 2009年12月31日

土壤氮素行为及其模拟模型不确定性的Monte-Carlo分析

国家自然科学基金

0+阅读 · 2008年12月31日

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

Arxiv

0+阅读 · 2023年5月29日

A Benchmark Comparison of Imitation Learning-based Control Policies for Autonomous Racing

Arxiv

0+阅读 · 2023年5月28日

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

Arxiv

0+阅读 · 2023年5月26日

A Simulation Environment and Reinforcement Learning Method for Waste Reduction

Arxiv

0+阅读 · 2023年5月26日

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Arxiv

0+阅读 · 2023年5月25日

Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

Arxiv

0+阅读 · 2023年5月25日

Markov Decision Process with an External Temporal Process

Arxiv

0+阅读 · 2023年5月25日

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月25日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

VIP会员

文章信息

相关主题

马尔可夫决策过程

相关VIP内容

148页最新《深度强化学习》教程，148页ppt

148页最新《深度强化学习》教程，148页ppt

专知会员服务

77+阅读 · 2023年4月29日

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

专知会员服务

231+阅读 · 2022年4月10日

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

67+阅读 · 2022年3月29日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

Arxiv

0+阅读 · 2023年5月29日

A Benchmark Comparison of Imitation Learning-based Control Policies for Autonomous Racing

Arxiv

0+阅读 · 2023年5月28日

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

Arxiv

0+阅读 · 2023年5月26日

A Simulation Environment and Reinforcement Learning Method for Waste Reduction

Arxiv

0+阅读 · 2023年5月26日

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Arxiv

0+阅读 · 2023年5月25日

Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

Arxiv

0+阅读 · 2023年5月25日

Markov Decision Process with an External Temporal Process

Arxiv

0+阅读 · 2023年5月25日

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月25日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

基于供应链视角的环境治理：策略选择与协同机制

国家自然科学基金

0+阅读 · 2014年12月31日

焦虑情绪对社会决策行为的影响

国家自然科学基金

2+阅读 · 2013年12月31日

无界区域最优控制问题的无限元方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于不均衡支持向量机的小企业信用风险评价理论与模型

国家自然科学基金

0+阅读 · 2012年12月31日

非凸稀疏先验图像恢复建模理论和算法

国家自然科学基金

0+阅读 · 2012年12月31日

奇异摄动问题各向异性自适应有限元

国家自然科学基金

0+阅读 · 2012年12月31日

知识员工反生产行为对知识创新的影响效应与干预策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于行为决策理论的决策分析方法及其应用研究

国家自然科学基金

4+阅读 · 2009年12月31日

土壤氮素行为及其模拟模型不确定性的Monte-Carlo分析

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员