Derivative based optimization methods are efficient at solving optimal control problems near local optima. However, their ability to converge halts when derivative information vanishes. The inference approach to optimal control does not have strict requirements on the objective landscape. However, sampling, the primary tool for solving such problems, tends to be much slower in computation time. We propose a new method that combines second order methods with inference. We utilise the Kullback Leibler (KL) control framework to formulate an inference problem that computes the optimal controls from an adaptive distribution approximating the solution of the second order method. Our method allows for combining simple convex and non convex cost functions. This simplifies the process of cost function design and leverages the strengths of both inference and second order optimization. We compare our method to Model Predictive Path Integral (MPPI) and iterative Linear Quadratic Regulator (iLQG), outperforming both in sample efficiency and quality on manipulation and obstacle avoidance tasks.
翻译:基于衍生的优化方法在解决当地Popima附近的最佳控制问题方面是有效的。 但是,当衍生信息消失时,它们聚集的最佳控制问题的能力就会停止。 最佳控制的推断方法对客观环境没有严格的要求。 但是,抽样是解决这些问题的主要工具,在计算时间上往往要慢得多。 我们提出一种将二阶方法与推理相结合的新方法。 我们使用Kullback Leibel(KL)控制框架来拟订一种推论问题,从适应性近似分配方法的解决方案中计算出最佳控制。 我们的方法允许将简单的 convex和非convex的成本功能结合起来。 这简化了成本功能设计的过程,并充分利用了二阶优化的优势。 我们比较了我们的方法,将预测路径集成模型和迭代线性二次调控管(iLQG),在操作和障碍避免任务方面,在抽样效率和质量上均优于标准。