In this paper, we tackle the problem of Unmanned Aerial (UA V) path planning in complex and uncertain environments by designing a Model Predictive Control (MPC), based on a Long-Short-Term Memory (LSTM) network integrated into the Deep Deterministic Policy Gradient algorithm. In the proposed solution, LSTM-MPC operates as a deterministic policy within the DDPG network, and it leverages a predicting pool to store predicted future states and actions for improved robustness and efficiency. The use of the predicting pool also enables the initialization of the critic network, leading to improved convergence speed and reduced failure rate compared to traditional reinforcement learning and deep reinforcement learning methods. The effectiveness of the proposed solution is evaluated by numerical simulations.
翻译:在本文中,我们通过设计基于长期内存(LSTM)网络的模型预测控制(MPC)来应对在复杂和不确定的环境中无人驾驶航空(UA V)路径规划的问题,将它纳入深确定性政策梯度算法。在拟议的解决方案中,LSTM-MPC作为DDPG网络内的一项确定性政策运作,它利用预测人才库储存预测的未来状态和行动,以提高稳健性和效率。预测人才库的使用还使得批评者网络得以初始化,导致与传统的强化学习和深强化学习方法相比,趋同速度加快和故障率降低。提议的解决方案的有效性通过数字模拟来评估。