Many Imitation and Reinforcement Learning approaches rely on the availability of expert-generated demonstrations for learning policies or value functions from data. Obtaining a reliable distribution of trajectories from motion planners is non-trivial, since it must broadly cover the space of states likely to be encountered during execution while also satisfying task-based constraints. We propose a sampling strategy based on variational inference to generate distributions of feasible, low-cost trajectories for high-dof motion planning tasks. This includes a distributed, particle-based motion planning algorithm which leverages a structured graphical representations for inference over multi-modal posterior distributions. We also make explicit connections to both approximate inference for trajectory optimization and entropy-regularized reinforcement learning.
翻译:许多消化和强化学习方法依靠专家为学习政策或数据价值功能提供的示范材料。从运动规划者那里获得可靠的轨迹分布是非三重性的,因为它必须广泛覆盖执行期间可能遇到的国家的空间,同时满足基于任务的制约因素。我们提议基于变式推论的抽样战略,以便为高运动规划任务提供可行、低成本的轨迹分布。这包括分布式的、基于粒子的运动规划算法,利用结构化的图形表达法来推断多模式的场景分布。我们还与轨迹优化的近似推论和正态强化学习有明确的联系。