The action governor is an add-on scheme to a nominal control loop that monitors and adjusts the control actions to enforce safety specifications expressed as pointwise-in-time state and control constraints. In this paper, we introduce the Robust Action Governor (RAG) for systems the dynamics of which can be represented using discrete-time Piecewise Affine (PWA) models with both parametric and additive uncertainties and subject to non-convex constraints. We develop the theoretical properties and computational approaches for the RAG. After that, we introduce the use of the RAG for realizing safe Reinforcement Learning (RL), i.e., ensuring all-time constraint satisfaction during online RL exploration-and-exploitation process. This development enables safe real-time evolution of the control policy and adaptation to changes in the operating environment and system parameters (due to aging, damage, etc.). We illustrate the effectiveness of the RAG in constraint enforcement and safe RL using the RAG by considering their applications to a soft-landing problem of a mass-spring-damper system.
翻译:行动督导是名义控制循环的附加计划,它监测和调整控制行动,以强制执行作为时间点状态和控制限制的安全规格。在本文件中,我们为系统引入了强力行动督导(RAG),其动态可以使用离散的、具有参数性和添加性不确定性且受非凝固性制约的分时间的Pafis Affine(PWA)模型来代表。我们为RAG开发理论属性和计算方法。之后,我们引入了使用RAG实现安全强化学习(RL)的方法,即确保在线RL勘探和开发过程中的全时限制满意度。这种开发使得控制政策能够安全实时地演变,并适应操作环境和系统参数的变化(由于老化、损坏等原因)。我们通过考虑对软着陆的大规模冲洗系统的应用,来说明RAG在制约执行和安全使用RL方面的有效性。