安全MDP规划：学习不良轨迹的时间模式以避免负面副作用 (Safe MDP Planning by Learning Temporal Patterns of Undesirable Trajectories and Averting Negative Side Effects) - 专知论文

会员服务 ·

0

安全约束 · 安全模型 · 约束 · 状态表示 · 计算图 ·

2023 年 4 月 6 日

Safe MDP Planning by Learning Temporal Patterns of Undesirable Trajectories and Averting Negative Side Effects

翻译：安全MDP规划：学习不良轨迹的时间模式以避免负面副作用

Siow Meng Low,Akshat Kumar,Scott Sanner

In safe MDP planning, a cost function based on the current state and action is often used to specify safety aspects. In the real world, often the state representation used may lack sufficient fidelity to specify such safety constraints. Operating based on an incomplete model can often produce unintended negative side effects (NSEs). To address these challenges, first, we associate safety signals with state-action trajectories (rather than just an immediate state-action). This makes our safety model highly general. We also assume categorical safety labels are given for different trajectories, rather than a numerical cost function, which is harder to specify by the problem designer. We then employ a supervised learning model to learn such non-Markovian safety patterns. Second, we develop a Lagrange multiplier method, which incorporates the safety model and the underlying MDP model in a single computation graph to facilitate agent learning of safe behaviors. Finally, our empirical results on a variety of discrete and continuous domains show that this approach can satisfy complex non-Markovian safety constraints while optimizing an agent's total returns, is highly scalable, and is also better than the previous best approach for Markovian NSEs.

翻译：在安全MDP规划中，基于当前状态和动作的成本函数通常用于指定安全性方面。在真实世界中，经常使用的状态表示可能缺乏足够的准确性来指定这些安全约束。基于不完整模型运行往往会产生意想不到的负面副作用（NSEs）。为了应对这些挑战，首先，我们将安全信号与状态-动作轨迹相关联（而不仅仅是即时状态-动作）。这使得我们的安全模型高度通用。我们还假设为不同轨迹给定分类安全标签，而不是更难由问题设计者指定的数值成本函数。然后，我们采用监督学习模型来学习这些非马尔可夫安全模式。其次，我们开发了一种拉格朗日乘数方法，将安全模型和基础MDP模型纳入单个计算图中，以促进代理学习安全行为。最后，我们在各种离散和连续域上的实证结果表明，该方法可以在优化代理的总回报的同时满足复杂的非马可夫安全约束，具有高度可扩展性，并且对于马可夫NSEs而言比以前最佳方法更好。

0

相关内容

安全约束

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【Alex Nowak-Vila博士论文】有理论保证的结构化预测， Structured Prediction with Theoretical Guarantees

【Alex Nowak-Vila博士论文】有理论保证的结构化预测， Structured Prediction with Theoretical Guarantees

专知会员服务

13+阅读 · 2022年3月15日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

专知会员服务

17+阅读 · 2022年2月16日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【CVPR2020】物体实例持续学习，Continual Learning of Object Instances

【CVPR2020】物体实例持续学习，Continual Learning of Object Instances

专知会员服务

32+阅读 · 2020年4月26日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

专知会员服务

12+阅读 · 2019年11月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】DynaSLAM：基于动态目标检测和背景修复的视觉SLAM

【泡泡一分钟】DynaSLAM：基于动态目标检测和背景修复的视觉SLAM

泡泡机器人SLAM

16+阅读 · 2019年1月27日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【泡泡一分钟】PathTrack：使用路径监督的快速轨迹标注方法（ICCV2017-28）

【泡泡一分钟】PathTrack：使用路径监督的快速轨迹标注方法（ICCV2017-28）

泡泡机器人SLAM

10+阅读 · 2018年5月26日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

适定的多元样条逼近方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

集值优化问题的逼近解及二阶最优性条件

国家自然科学基金

0+阅读 · 2014年12月31日

基于时间和优先权约束的柔性制造系统控制器设计

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

图的有限定条件的圈问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于合同匹配理论的流域水市场机制设计

国家自然科学基金

0+阅读 · 2012年12月31日

代数曲线在序列中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

非负二次函数锥规划研究

国家自然科学基金

0+阅读 · 2011年12月31日

随机微分方程的逼近

国家自然科学基金

0+阅读 · 2009年12月31日

Constrained Reinforcement Learning for Dynamic Material Handling

Arxiv

0+阅读 · 2023年5月23日

Advancing Community Engaged Approaches to Identifying Structural Drivers of Racial Bias in Health Diagnostic Algorithms

Arxiv

0+阅读 · 2023年5月22日

DeRi-Bot: Learning to Collaboratively Manipulate Rigid Objects via Deformable Objects

Arxiv

0+阅读 · 2023年5月22日

Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models

Arxiv

0+阅读 · 2023年5月22日

Zero-shot Transferable and Persistently Feasible Safe Control for High Dimensional Systems by Consistent Abstraction

Arxiv

0+阅读 · 2023年5月21日

Examining the Differential Risk from High-level Artificial Intelligence and the Question of Control

Arxiv

0+阅读 · 2023年5月20日

Off-policy evaluation beyond overlap: partial identification through smoothness

Arxiv

0+阅读 · 2023年5月19日

Bayesian Estimation of Laser Linewidth from Delayed Self-Heterodyne Measurements

Arxiv

0+阅读 · 2023年5月18日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

VIP会员

文章信息

相关主题

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【Alex Nowak-Vila博士论文】有理论保证的结构化预测， Structured Prediction with Theoretical Guarantees

【Alex Nowak-Vila博士论文】有理论保证的结构化预测， Structured Prediction with Theoretical Guarantees

专知会员服务

13+阅读 · 2022年3月15日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

专知会员服务

17+阅读 · 2022年2月16日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【CVPR2020】物体实例持续学习，Continual Learning of Object Instances

【CVPR2020】物体实例持续学习，Continual Learning of Object Instances

专知会员服务

32+阅读 · 2020年4月26日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

专知会员服务

12+阅读 · 2019年11月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】DynaSLAM：基于动态目标检测和背景修复的视觉SLAM

【泡泡一分钟】DynaSLAM：基于动态目标检测和背景修复的视觉SLAM

泡泡机器人SLAM

16+阅读 · 2019年1月27日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【泡泡一分钟】PathTrack：使用路径监督的快速轨迹标注方法（ICCV2017-28）

【泡泡一分钟】PathTrack：使用路径监督的快速轨迹标注方法（ICCV2017-28）

泡泡机器人SLAM

10+阅读 · 2018年5月26日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Constrained Reinforcement Learning for Dynamic Material Handling

Arxiv

0+阅读 · 2023年5月23日

Advancing Community Engaged Approaches to Identifying Structural Drivers of Racial Bias in Health Diagnostic Algorithms

Arxiv

0+阅读 · 2023年5月22日

DeRi-Bot: Learning to Collaboratively Manipulate Rigid Objects via Deformable Objects

Arxiv

0+阅读 · 2023年5月22日

Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models

Arxiv

0+阅读 · 2023年5月22日

Zero-shot Transferable and Persistently Feasible Safe Control for High Dimensional Systems by Consistent Abstraction

Arxiv

0+阅读 · 2023年5月21日

Examining the Differential Risk from High-level Artificial Intelligence and the Question of Control

Arxiv

0+阅读 · 2023年5月20日

Off-policy evaluation beyond overlap: partial identification through smoothness

Arxiv

0+阅读 · 2023年5月19日

Bayesian Estimation of Laser Linewidth from Delayed Self-Heterodyne Measurements

Arxiv

0+阅读 · 2023年5月18日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

相关基金

适定的多元样条逼近方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

集值优化问题的逼近解及二阶最优性条件

国家自然科学基金

0+阅读 · 2014年12月31日

基于时间和优先权约束的柔性制造系统控制器设计

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

图的有限定条件的圈问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于合同匹配理论的流域水市场机制设计

国家自然科学基金

0+阅读 · 2012年12月31日

代数曲线在序列中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

非负二次函数锥规划研究

国家自然科学基金

0+阅读 · 2011年12月31日

随机微分方程的逼近

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员