安全价值函数 (Safe Value Functions) - 专知论文

会员服务 ·

0

泛函 · 价值函数 · 优化器 · 奖励函数 · 控制器 ·

2022 年 12 月 1 日

Safe Value Functions

翻译：安全价值函数

Pierre-François Massiani,Steve Heim,Friedrich Solowjow,Sebastian Trimpe

from arxiv, 16 pages, 6 figures. Accepted for publication in: Transactions of Automatic Control, special issue on Learning and Control

Safety constraints and optimality are important, but sometimes conflicting criteria for controllers. Although these criteria are often solved separately with different tools to maintain formal guarantees, it is also common practice in reinforcement learning to simply modify reward functions by penalizing failures, with the penalty treated as a mere heuristic. We rigorously examine the relationship of both safety and optimality to penalties, and formalize sufficient conditions for safe value functions (SVFs): value functions that are both optimal for a given task, and enforce safety constraints. We reveal this structure by examining when rewards preserve viability under optimal control, and show that there always exists a finite penalty that induces a safe value function. This penalty is not unique, but upper-unbounded: larger penalties do not harm optimality. Although it is often not possible to compute the minimum required penalty, we reveal clear structure of how the penalty, rewards, discount factor, and dynamics interact. This insight suggests practical, theory-guided heuristics to design reward functions for control problems where safety is important.

翻译：安全限制和最佳性是十分重要的,但对于控制者来说,这些标准有时是相互冲突的标准。虽然这些标准往往是用不同的工具分别解决的,以维持正式保证,但在强化学习方面也是常见的做法,即仅仅通过惩罚失败来修改奖励功能,将惩罚视为一种累赘。我们严格审查安全和最佳性与惩罚之间的关系,并正式确定安全价值功能(SVF)的充分条件:价值功能对某项任务来说都是最佳的,并且强制执行安全限制。我们通过检查在最佳控制下维持生存能力时的奖励来显示这一结构,并表明始终存在着一种导致安全价值功能的有限惩罚。这种惩罚不是独一无二的,而是上下限的:更大的惩罚不会损害最佳性。尽管通常无法计算最低的处罚,但我们清楚地揭示了惩罚、奖励、折扣因素和动态如何相互作用的结构。这种洞察力表明,在安全性很重要的地方,在设计控制问题的奖励功能时,如何设计出实用、理论引导的过度性。

0

相关内容

【图机器学习进展与趋势@ICML2022】Graph Machine Learning @ ICML 2022

【图机器学习进展与趋势@ICML2022】Graph Machine Learning @ ICML 2022

专知会员服务

40+阅读 · 2022年7月25日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

83+阅读 · 2022年3月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

针对肿瘤细胞周期的时相靶向制剂的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

荷载作用下磷块岩晶界裂纹产生扩展贯通破坏机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂产品装配工艺规划理论研究

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于高斯类数的自相似集的几何分类

国家自然科学基金

0+阅读 · 2012年12月31日

轻质层状硼化物复合陶瓷结构设计及双尺度理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

Toll 样受体介导的巨噬细胞对prion清除的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

Arxiv

0+阅读 · 2023年2月3日

On the Opportunity of Causal Deep Generative Models: A Survey and Future Directions

Arxiv

0+阅读 · 2023年2月3日

Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年2月2日

Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function

Arxiv

0+阅读 · 2023年2月2日

Nonparametric calibration for stochastic reaction-diffusion equations based on discrete observations

Arxiv

0+阅读 · 2023年2月2日

Random points are optimal for the approximation of Sobolev functions

Arxiv

0+阅读 · 2023年2月1日

Learning Choice Functions with Gaussian Processes

Arxiv

0+阅读 · 2023年2月1日

A survey and taxonomy of loss functions in machine learning

Arxiv

26+阅读 · 2023年1月13日

Automated Graph Machine Learning: Approaches, Libraries and Directions

Arxiv

20+阅读 · 2022年1月4日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

VIP会员

文章信息

相关主题

相关VIP内容

【图机器学习进展与趋势@ICML2022】Graph Machine Learning @ ICML 2022

【图机器学习进展与趋势@ICML2022】Graph Machine Learning @ ICML 2022

专知会员服务

40+阅读 · 2022年7月25日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

83+阅读 · 2022年3月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

Arxiv

0+阅读 · 2023年2月3日

On the Opportunity of Causal Deep Generative Models: A Survey and Future Directions

Arxiv

0+阅读 · 2023年2月3日

Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年2月2日

Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function

Arxiv

0+阅读 · 2023年2月2日

Nonparametric calibration for stochastic reaction-diffusion equations based on discrete observations

Arxiv

0+阅读 · 2023年2月2日

Random points are optimal for the approximation of Sobolev functions

Arxiv

0+阅读 · 2023年2月1日

Learning Choice Functions with Gaussian Processes

Arxiv

0+阅读 · 2023年2月1日

A survey and taxonomy of loss functions in machine learning

Arxiv

26+阅读 · 2023年1月13日

Automated Graph Machine Learning: Approaches, Libraries and Directions

Arxiv

20+阅读 · 2022年1月4日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

相关基金

针对肿瘤细胞周期的时相靶向制剂的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

荷载作用下磷块岩晶界裂纹产生扩展贯通破坏机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂产品装配工艺规划理论研究

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于高斯类数的自相似集的几何分类

国家自然科学基金

0+阅读 · 2012年12月31日

轻质层状硼化物复合陶瓷结构设计及双尺度理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

Toll 样受体介导的巨噬细胞对prion清除的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员