Reinforcement learning (RL) agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances that an uncertain system might encounter. To guarantee performance while assuring satisfaction of safety constraints across variety of circumstances, an assured autonomous control framework is presented in this paper by empowering RL algorithms with metacognitive learning capabilities. More specifically, adapting the reward function parameters of the RL agent is performed in a metacognitive decision-making layer to assure the feasibility of RL agent. That is, to assure that the learned policy by the RL agent satisfies safety constraints specified by signal temporal logic while achieving as much performance as possible. The metacognitive layer monitors any possible future safety violation under the actions of the RL agent and employs a higher-layer Bayesian RL algorithm to proactively adapt the reward function for the lower-layer RL agent. To minimize the higher-layer Bayesian RL intervention, a fitness function is leveraged by the metacognitive layer as a metric to evaluate success of the lower-layer RL agent in satisfaction of safety and liveness specifications, and the higher-layer Bayesian RL intervenes only if there is a risk of lower-layer RL failure. Finally, a simulation example is provided to validate the effectiveness of the proposed approach.
翻译:具有预先规定的奖赏功能的强化学习代理机构无法在不确定的系统可能遇到的各种情况下提供有保障的安全保障。为了在确保满足各种情况的安全限制的同时保证工作绩效,本文件通过赋予具有内分辨学习能力的RL算法,介绍了一个有保障的自主控制框架。更具体地说,调整RL代理机构的奖赏功能参数,在元化决策层中进行,以确保RL代理机构的可行性。也就是说,确保RL代理机构所学的政策满足信号时间逻辑规定的安全限制,同时尽可能实现最大程度的性能。元分解层监测RL代理机构行动下今后可能发生的任何违反安全的情况,并采用高层次的Bayesian RL算法,以积极调整低层次RL代理机构的奖励功能。为了尽量减少高层次的Bayesian RL干预,一个健康功能被元分层利用,作为衡量低层次RL代理机构在满足安全和生活要求方面成功与否的衡量标准,而高层次的Bayesian RL干预方法只有在提议的RL级验证方法下进行模拟的情况下,才提供最后层次的模拟失败。