Reinforcement learning (RL) is a control approach that can handle nonlinear stochastic optimal control problems. However, despite the promise exhibited, RL has yet to see marked translation to industrial practice primarily due to its inability to satisfy state constraints. In this work we aim to address this challenge. We propose an 'oracle'-assisted constrained Q-learning algorithm that guarantees the satisfaction of joint chance constraints with a high probability, which is crucial for safety critical tasks. To achieve this, constraint tightening (backoffs) are introduced and adjusted using Broyden's method, hence making them self-tuned. This results in a general methodology that can be imbued into approximate dynamic programming-based algorithms to ensure constraint satisfaction with high probability. Finally, we present case studies that analyze the performance of the proposed approach and compare this algorithm with model predictive control (MPC). The favorable performance of this algorithm signifies a step toward the incorporation of RL into real world optimization and control of engineering systems, where constraints are essential in ensuring safety.
翻译:强化学习(RL)是一种控制方法,可以处理非线性随机最佳控制问题。然而,尽管已经展示了希望,但RL尚未看到工业实践的明显转化,这主要是因为它无法满足国家的限制。在这项工作中,我们的目标是应对这一挑战。我们提出了一个“orac”辅助的限制性Q-学习算法,它能保证高概率地满足共同机会限制,这对安全关键任务至关重要。为了实现这一点,采用Broyden的方法来实施和调整限制收紧(后退),从而使它们自我调整。这导致一种一般方法,可以渗透到大致动态的基于程序拟定的算法中,以确保高概率的制约性满意度。最后,我们提出案例研究,分析拟议方法的绩效,并将这种算法与模型预测控制(MPC)进行比较。这一有利的算法表现意味着在将RL纳入真正的世界优化和控制工程系统方面迈出了一步,在确保安全方面制约至关重要。