In this paper, we consider the state estimation problem for nonlinear stochastic discrete-time systems. We combine Lyapunov's method in control theory and deep reinforcement learning to design the state estimator. We theoretically prove the convergence of the bounded estimate error solely using the data simulated from the model. An actor-critic reinforcement learning algorithm is proposed to learn the state estimator approximated by a deep neural network. The convergence of the algorithm is analysed. The proposed Lyapunov-based reinforcement learning state estimator is compared with a number of existing nonlinear filtering methods through Monte Carlo simulations, showing its advantage in terms of estimate convergence even under some system uncertainties such as covariance shift in system noise and randomly missing measurements. To the best of our knowledge, this is the first reinforcement learning based nonlinear state estimator with bounded estimate error performance guarantee.
翻译:在本文中, 我们考虑非线性随机离散时间系统的状态估计问题。 我们结合了 Lyapunov 的控制理论和深度强化学习方法来设计州测算器。 我们理论上证明, 仅使用模拟模型中的数据, 约束估计误差的趋同性。 提出一个演员- 北极强化学习算法, 以学习由深神经网络所近近近的国家测算器。 分析算法的趋同情况。 将基于 Lyapunov 的增强性学习状态测算器与一些现有的非线性过滤方法进行比较, 通过 Monte Carlo 模拟, 显示其优势, 即使在某些系统不确定性( 如系统噪声的共变和随机缺失测量) 下, 也显示其优势。 根据我们所知, 这是第一个基于非线性测算法的强化学习算法, 并带有约束性估计误差性能保证。