In this work, we address the problem of learning provably stable neural network policies for stochastic control systems. While recent work has demonstrated the feasibility of certifying given policies using martingale theory, the problem of how to learn such policies is little explored. Here, we study the effectiveness of jointly learning a policy together with a martingale certificate that proves its stability using a single learning algorithm. We observe that the joint optimization problem becomes easily stuck in local minima when starting from a randomly initialized policy. Our results suggest that some form of pre-training of the policy is required for the joint optimization to repair and verify the policy successfully.
翻译:在这项工作中,我们解决了学习稳定神经网络政策的问题。虽然最近的工作表明使用马丁格尔理论验证政策的可行性,但很少探讨如何学习这种政策的问题。在这里,我们研究联合学习一项政策的有效性,以及使用单一学习算法证明其稳定性的马丁格尔证书的有效性。我们注意到,联合优化问题从随机启动的政策开始,很容易被困在本地小型项目中。我们的结果表明,为了联合优化以成功修理和核实政策,需要某种形式的政策培训。