Particle filtering is a popular method for inferring latent states in stochastic dynamical systems, whose theoretical properties have been well studied in machine learning and statistics communities. In many control problems, e.g., partially observed linear dynamical systems (POLDS), oftentimes the inferred latent state is further used for planning at each step. This paper initiates a rigorous study on the efficiency of particle filtering for sequential planning, and gives the first particle complexity bounds. Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference. In particular, we show that, in stable systems, polynomially many particles suffice. Key in our proof is a coupling of the ideal sequence based on the exact planning and the sequence generated by approximate planning based on particle filtering. We believe this technique can be useful in other sequential decision-making problems.
翻译:粒子过滤是一种常用的方法,用于在随机动态系统中推断潜伏状态,其理论特性在机器学习和统计界中得到了很好的研究。在许多控制问题中,例如部分观测到的线性动态系统(POLDS),通常会进一步使用推断的潜伏状态来进行每一步的规划。本文对粒子过滤的效率进行严格的研究,以便进行顺序规划,并给出第一个粒子复杂性界限。虽然过去的行动有误会影响未来,但我们能够将需要的粒子数量捆绑起来,这样,基于粒子过滤的政策的长期回报就接近于基于精确推断的结果。特别是,我们证明在稳定的系统中,多颗粒就足够了。我们的证据之关键是,根据粒子过滤的精确规划和近似规划产生的序列,将理想序列相混合在一起。我们认为,这种技术可以在其他顺序决策问题上有用。