Nonlinear trajectory optimization algorithms have been developed to handle optimal control problems with nonlinear dynamics and nonconvex constraints in trajectory planning. The performance and computational efficiency of many trajectory optimization methods are sensitive to the initial guess, i.e., the trajectory guess needed by the recursive trajectory optimization algorithm. Motivated by this observation, we tackle the initialization problem for trajectory optimization via policy optimization. To optimize a policy, we propose a guided policy search method that has two key components: i) Trajectory update; ii) Policy update. The trajectory update involves offline solutions of a large number of trajectory optimization problems from different initial states via Sequential Convex Programming (SCP). Here we take a single SCP step to generate the trajectory iterate for each problem. In conjunction with these iterates, we also generate additional trajectories around each iterate via a feedback control law. Then all these trajectories are used by a stochastic gradient descent algorithm to update the neural network policy, i.e., the policy update step. As a result, the trained policy makes it possible to generate trajectory candidates that are close to the optimality and feasibility and that provide excellent initial guesses for the trajectory optimization methods. We validate the proposed method via a real-world 6-degree-of-freedom powered descent guidance problem for a reusable rocket.
翻译:开发了非线性轨迹优化算法, 以处理非线性动态和非线性约束的最佳控制问题, 以及轨迹规划中的非线性约束。 许多轨迹优化方法的性能和计算效率对于最初的猜测十分敏感, 即循环性轨迹优化算法所需的轨迹猜测。 受此观察的驱动, 我们通过政策优化来解决轨迹优化的初始化问题。 为了优化政策, 我们提出了一个有指导的政策搜索方法, 该方法有两个关键组成部分 : (i) 轨迹更新; (ii) 政策更新。 轨迹更新包含通过序列式 Convex 程序( SCP) 解决不同初始州大量轨迹优化问题的离线性解决方案。 在这里, 我们采取单一的 SCP 步骤来生成每个问题所需的轨迹。 结合这些观察, 我们还通过反馈控制法在轨迹优化轨迹上产生额外的轨迹。 然后, 所有这些轨迹都由无孔径梯梯梯的梯性梯性梯子用于更新神经网络政策, 即政策更新步骤。 。 作为结果, 经过培训的一步,, 我们的轨迹态模型的最初的预估测方法提供了一个最佳的模型, 以生成的模型的模型的模型的模型的预测算。