多目标SPIBB:Seldonian离岸政策改进,在有限多边发展方案中设置安全限制 (Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs) - 专知论文

会员服务 ·

0

策略改进 · Performer · 基准 · 约束 · 回合 ·

2021 年 5 月 31 日

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

翻译：多目标SPIBB:Seldonian离岸政策改进,在有限多边发展方案中设置安全限制

Harsh Satija,Philip S. Thomas,Joelle Pineau,Romain Laroche

We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals while ensuring that the new policy performs at least as well as the baseline policy along each individual objective. We build on traditional SPI algorithms and propose a novel method based on Safe Policy Iteration with Baseline Bootstrapping (SPIBB, Laroche et al., 2019) that provides high probability guarantees on the performance of the agent in the true environment. We show the effectiveness of our method on a synthetic grid-world safety task as well as in a real-world critical care context to learn a policy for the administration of IV fluids and vasopressors to treat sepsis.

翻译：我们研究了在离线强化学习(RL)设置的限制下的安全政策改进问题,我们考虑了以下情况:(一)我们根据已知基线政策收集了数据集,(二)从环境中收到多种奖励信号,从而产生实现最佳目标的众多目标,我们为这一区域环境提出了安全政策改进方案,其中考虑到算法使用者在处理不同奖励信号的取舍方面的偏好,同时确保新政策至少按照每个目标执行基线政策,我们利用传统的SPI算法,并提出了以基线推进的安全政策渗透为基础的新颖方法(SPIB、Laroche等人,2019年),该方法为代理人在真实环境中的性能提供了很高的概率保证,我们展示了我们的方法在合成网络世界安全任务以及在现实世界的关键护理环境中的有效性,以学习管理IV液体和血管压抑剂的政策来治疗呼吸系统。

0

相关内容

策略改进

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【KDD2020教程】多模态网络表示学习

【KDD2020教程】多模态网络表示学习

专知会员服务

132+阅读 · 2020年8月26日

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

专知会员服务

53+阅读 · 2020年2月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【IJCAI 2019 | tutorial】解决具有复杂策略空间的游戏中的问题 Solving Games With Complex Strategy Spaces，林肯大学|Hau Chan，卡内基梅隆大学|Fei Fang

【IJCAI 2019 | tutorial】解决具有复杂策略空间的游戏中的问题 Solving Games With Complex Strategy Spaces，林肯大学|Hau Chan，卡内基梅隆大学|Fei Fang

专知会员服务

29+阅读 · 2019年8月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces

Arxiv

0+阅读 · 2021年8月9日

An Interleaved Approach to Trait-Based Task Allocation and Scheduling

An Interleaved Approach to Trait-Based Task Allocation and Scheduling

Arxiv

0+阅读 · 2021年8月5日

Lyapunov Robust Constrained-MDPs: Soft-Constrained Robustly Stable Policy Optimization under Model Uncertainty

Arxiv

0+阅读 · 2021年8月5日

Conservative Objective Models for Effective Offline Model-Based Optimization

Arxiv

4+阅读 · 2021年7月14日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月17日

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Arxiv

3+阅读 · 2018年8月2日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Distributed Constraint Optimization Problems and Applications: A Survey

Arxiv

5+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

相关VIP内容

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【KDD2020教程】多模态网络表示学习

【KDD2020教程】多模态网络表示学习

专知会员服务

132+阅读 · 2020年8月26日

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

【哥伦比亚大学】经济AI优化课程，Economics, AI, and Optimization

专知会员服务

53+阅读 · 2020年2月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【IJCAI 2019 | tutorial】解决具有复杂策略空间的游戏中的问题 Solving Games With Complex Strategy Spaces，林肯大学|Hau Chan，卡内基梅隆大学|Fei Fang

【IJCAI 2019 | tutorial】解决具有复杂策略空间的游戏中的问题 Solving Games With Complex Strategy Spaces，林肯大学|Hau Chan，卡内基梅隆大学|Fei Fang

专知会员服务

29+阅读 · 2019年8月12日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACMMM2025教程】打击网络虚假信息视频：特征分析、检测与防范，170页ppt

海军无人系统：海上作战的演进而非革命

Nature 子刊 | SciToolAgent:知识图谱引导的科学工具智能体

多媒体顶会ACM Multimedia 2025各大奖项揭晓！格拉斯哥大学等获最佳论文，中科院自动化所等获最佳学生论文

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces

Arxiv

0+阅读 · 2021年8月9日

An Interleaved Approach to Trait-Based Task Allocation and Scheduling

An Interleaved Approach to Trait-Based Task Allocation and Scheduling

Arxiv

0+阅读 · 2021年8月5日

Lyapunov Robust Constrained-MDPs: Soft-Constrained Robustly Stable Policy Optimization under Model Uncertainty

Arxiv

0+阅读 · 2021年8月5日

Conservative Objective Models for Effective Offline Model-Based Optimization

Arxiv

4+阅读 · 2021年7月14日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月17日

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Arxiv

3+阅读 · 2018年8月2日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Distributed Constraint Optimization Problems and Applications: A Survey

Arxiv

5+阅读 · 2018年1月11日

微信扫码咨询专知VIP会员