In contrast to the control-theoretic methods, the lack of stability guarantee remains a significant problem for model-free reinforcement learning (RL) methods. Jointly learning a policy and a Lyapunov function has recently become a promising approach to ensuring the whole system with a stability guarantee. However, the classical Lyapunov constraints researchers introduced cannot stabilize the system during the sampling-based optimization. Therefore, we propose the Adaptive Stability Certification (ASC), making the system reach sampling-based stability. Because the ASC condition can search for the optimal policy heuristically, we design the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm based on the ASC condition. Meanwhile, our algorithm avoids the optimization problem that a variety of constraints are coupled into the objective in current approaches. When evaluated on ten robotic tasks, our method achieves lower accumulated cost and fewer stability constraint violations than previous studies.
翻译:与控制理论方法不同,缺乏稳定性保障仍然是无模型强化学习方法的重大问题。共同学习政策和Lyapunov函数最近成为确保整个系统有稳定保障的一个很有希望的方法。然而,古典的Lyapunov限制研究者在取样优化期间无法稳定系统。因此,我们建议采用适应性稳定认证,使系统达到基于取样的稳定。由于ASC条件可以超常地寻找最佳政策,我们根据ASC条件设计了适应性Lyapunov-ALAC(ALAC)算法。与此同时,我们的算法避免了优化问题,即各种限制与当前方法的目标相结合。在对10项机器人任务进行评估时,我们的方法的累积成本较低,稳定性限制比以往的研究要少。