安全价值函数 (Safe Value Functions) - 专知论文

会员服务 ·

0

价值函数 · 泛函 · 奖励函数 · 强对偶性 · 优化器 ·

2021 年 5 月 25 日

Safe Value Functions

翻译：安全价值函数

Pierre-François Massiani,Steve Heim,Friedrich Solowjow,Sebastian Trimpe

The relationship between safety and optimality in control is not well understood, and they are often seen as important yet conflicting objectives. There is a pressing need to formalize this relationship, especially given the growing prominence of learning-based methods. Indeed, it is common practice in reinforcement learning to simply modify reward functions by penalizing failures, with the penalty treated as a mere heuristic. We rigorously examine this relationship, and formalize the requirements for safe value functions: value functions that are both optimal for a given task, and enforce safety. We reveal the structure of this relationship through a proof of strong duality, showing that there always exists a finite penalty that induces a safe value function. This penalty is not unique, but upper-unbounded: larger penalties do not harm optimality. Although it is often not possible to compute the minimum required penalty, we reveal clear structure of how the penalty, rewards, discount factor, and dynamics interact. This insight suggests practical, theory-guided heuristics to design reward functions for control problems where safety is important.

翻译：安全控制与最佳控制之间的关系没有被很好地理解,它们往往被视为重要但相互冲突的目标。迫切需要正式确定这种关系, 特别是鉴于学习方法的重要性日益突出。事实上, 通常的做法是加强学习,通过惩罚失败来修改奖励功能, 惩罚只是一种杂乱无章的处罚。我们严格审查这种关系,并正式确定安全价值功能的要求: 价值功能对于某项任务来说都是最佳的, 并且执行安全。我们通过强烈的双重性证明来揭示这种关系的结构, 表明始终存在着一种导致安全价值功能的有限惩罚。这种惩罚不是独一无二的,而是上下限的: 更大的惩罚不会损害最佳性。尽管我们常常无法计算最起码的所需惩罚, 但是我们揭示了惩罚、奖励、折扣因素和动态如何相互作用的明确结构。这个洞察显示,在安全非常重要的地方, 设计控制问题的奖赏功能是实用的, 理论引导的过度。

0

相关内容

价值函数

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Adaptive first-order methods revisited: Convex optimization without Lipschitz requirements

Arxiv

0+阅读 · 2021年7月16日

Partially Observable Markov Decision Processes (POMDPs) and Robotics

Arxiv

0+阅读 · 2021年7月15日

High-level Decisions from a Safe Maneuver Catalog with Reinforcement Learning for Safe and Cooperative Automated Merging

High-level Decisions from a Safe Maneuver Catalog with Reinforcement Learning for Safe and Cooperative Automated Merging

Arxiv

0+阅读 · 2021年7月15日

The Completion of Covariance Kernels

The Completion of Covariance Kernels

Arxiv

0+阅读 · 2021年7月15日

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Arxiv

0+阅读 · 2021年7月15日

Centralized Model and Exploration Policy for Multi-Agent RL

Arxiv

0+阅读 · 2021年7月14日

Fast Parallel-in-Time Quasi-Boundary Value Methods for Backward Heat Conduction Problems

Arxiv

0+阅读 · 2021年7月13日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank

Arxiv

3+阅读 · 2021年2月11日

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Arxiv

6+阅读 · 2018年3月30日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Adaptive first-order methods revisited: Convex optimization without Lipschitz requirements

Arxiv

0+阅读 · 2021年7月16日

Partially Observable Markov Decision Processes (POMDPs) and Robotics

Arxiv

0+阅读 · 2021年7月15日

High-level Decisions from a Safe Maneuver Catalog with Reinforcement Learning for Safe and Cooperative Automated Merging

High-level Decisions from a Safe Maneuver Catalog with Reinforcement Learning for Safe and Cooperative Automated Merging

Arxiv

0+阅读 · 2021年7月15日

The Completion of Covariance Kernels

The Completion of Covariance Kernels

Arxiv

0+阅读 · 2021年7月15日

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Arxiv

0+阅读 · 2021年7月15日

Centralized Model and Exploration Policy for Multi-Agent RL

Arxiv

0+阅读 · 2021年7月14日

Fast Parallel-in-Time Quasi-Boundary Value Methods for Backward Heat Conduction Problems

Arxiv

0+阅读 · 2021年7月13日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank

Arxiv

3+阅读 · 2021年2月11日

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Arxiv

6+阅读 · 2018年3月30日

微信扫码咨询专知VIP会员