We present an approach for approximately solving discrete-time stochastic optimal-control problems by combining direct trajectory optimization, deterministic sampling, and policy optimization. Our feedback motion-planning algorithm uses a quasi-Newton method to simultaneously optimize a reference trajectory, a set of deterministically chosen sample trajectories, and a parameterized policy. We demonstrate that this approach exactly recovers LQR policies in the case of linear dynamics, quadratic objective, and Gaussian disturbances. We also demonstrate the algorithm on several nonlinear, underactuated robotic systems to highlight its performance and ability to handle control limits, safely avoid obstacles, and generate robust plans in the presence of unmodeled dynamics.
翻译:我们提出了一个方法,通过将直接轨迹优化、确定性抽样和政策优化结合起来,解决离散时间随机最佳控制问题。 我们的反馈运动规划算法使用准牛顿方法,同时优化参考轨迹、一组确定性选择的样本轨迹和参数化政策。 我们证明,这种方法完全恢复了线性动态、四轨目标和高斯扰动情况下的LQR政策。 我们还展示了几个非线性、低活性机器人系统的算法,以突出其处理控制限度的性能和能力,安全避免障碍,并在非模型化动态下制定强有力的计划。