We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. These heuristic factors, obtained from parametric approximations of the marginal likelihood ahead, more effectively guide SMC towards the desired target distribution, which is particularly helpful for planning in environments with hard constraints placed sparsely in time. Compared with previous work, we modify the placement of such heuristic factors, which allows us to cheaply propose and evaluate large numbers of putative action particles, greatly increasing inference and planning efficiency. CriticSMC is compatible with informative priors, whose density function need not be known, and can be used as a model-free control algorithm. Our experiments on collision avoidance in a high-dimensional simulated driving task show that CriticSMC significantly reduces collision rates at a low computational cost while maintaining realism and diversity of driving behaviors across vehicles and environment scenarios.
翻译:我们引入了CriticSMC(CriticSMC),这是一个规划的新算法,它由相继的蒙特卡洛构成,具有丰富的Soft-Q功能超常因素。这些超常因素来自未来边缘可能性的参数近似值,可以更有效地引导SMC实现预期的目标分布,这对于在困难环境中规划工作特别有帮助,而这种环境在时间上受到很少的制约。与以往的工作相比,我们修改了这种超常因素的位置,使我们能够廉价地提议和评价大量模拟动作粒子,大大提高了推断和规划效率。CriticSMC(CriticSMC)与信息前科相容,其密度功能不需要知道,可以用作无模型的控制算法。我们在高维模拟驾驶任务中避免碰撞的实验表明,CriticSMC(CriticSMC)在保持车辆和环境情景之间驾驶行为的现实主义和多样性的同时,以低计算成本大幅降低碰撞率。