具有持续行动空间的多机构系统安全深层强化学习 (Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces)

Multi-agent control problems constitute an interesting area of application for deep reinforcement learning models with continuous action spaces. Such real-world applications, however, typically come with critical safety constraints that must not be violated. In order to ensure safety, we enhance the well-known multi-agent deep deterministic policy gradient (MADDPG) framework by adding a safety layer to the deep policy network. In particular, we extend the idea of linearizing the single-step transition dynamics, as was done for single-agent systems in Safe DDPG (Dalal et al., 2018), to multi-agent settings. We additionally propose to circumvent infeasibility problems in the action correction step using soft constraints (Kerrigan & Maciejowski, 2000). Results from the theory of exact penalty functions can be used to guarantee constraint satisfaction of the soft constraints under mild assumptions. We empirically find that the soft formulation achieves a dramatic decrease in constraint violations, making safety available even during the learning procedure.

翻译：多试剂控制问题是具有连续行动空间的深强化学习模式的一个有趣的应用领域,然而,这种现实世界应用通常会产生关键的安全限制,不得违反。为了确保安全,我们通过在深层政策网络中增加一个安全层来加强众所周知的多试剂深度确定政策梯度(MADDPG)框架。特别是,我们把单步过渡动态线线化的想法推广到多试剂环境,如安全DDPG(Dalal等,2018年)的单一试剂系统那样。我们还建议利用软约束(Kerrigan & Maciejowski,2000年)避免行动纠正步骤中的不可行性问题。精确惩罚功能理论的结果可用来保证在温和假设下限制对软约束的满足。我们从经验上发现软配方在限制违规方面实现了大幅度的减少,甚至在学习过程中也能提供安全。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日