We consider the problem of formally verifying almost-sure (a.s.) asymptotic stability in discrete-time nonlinear stochastic control systems. While verifying stability in deterministic control systems is extensively studied in the literature, verifying stability in stochastic control systems is an open problem. The few existing works on this topic either consider only specialized forms of stochasticity or make restrictive assumptions on the system, rendering them inapplicable to learning algorithms with neural network policies. In this work, we present an approach for general nonlinear stochastic control problems with two novel aspects: (a) instead of classical stochastic extensions of Lyapunov functions, we use ranking supermartingales (RSMs) to certify a.s.~asymptotic stability, and (b) we present a method for learning neural network RSMs. We prove that our approach guarantees a.s.~asymptotic stability of the system and provides the first method to obtain bounds on the stabilization time, which stochastic Lyapunov functions do not. Finally, we validate our approach experimentally on a set of nonlinear stochastic reinforcement learning environments with neural network policies.
翻译:我们认为,在离散的非线性非线性随机控制系统中,正式核查几乎(a.s.s.)无症状稳定性的问题。在对确定性控制系统的稳定性进行核查时,文献中广泛研究了对确定性控制系统稳定性的核查,但对随机控制系统稳定性的核查是一个尚未解决的问题。关于这一专题的现有工作很少,要么只考虑专门形式的随机性,要么对系统作出限制性的假设,使其不适用于学习神经网络政策的算法。在这项工作中,我们提出了一种办法,处理一般的非线性非线性随机控制问题,有两个新颖方面:(a) 而不是典型的Lyapunov功能的典型随机扩展,我们使用等级的超边际控制系统(RSMS)来验证a.s.~保护性控制系统稳定性,以及(b) 我们提出了一种学习神经网络RSMs的方法。我们证明我们的方法保证了系统在稳定时间上获得一定的界限,而这种固定式的静脉冲性液压功能不会产生。最后,我们用实验性环境来验证我们的实验性强化系统环境。