We study stochastic policy gradient methods from the perspective of control-theoretic limitations. Our main result is that ill-conditioned linear systems in the sense of Doyle inevitably lead to noisy gradient estimates. We also give an example of a class of stable systems in which policy gradient methods suffer from the curse of dimensionality. Our results apply to both state feedback and partially observed systems.
翻译:我们从控制理论限制的角度来研究政策梯度方法,我们的主要结果就是多伊尔意义上的不完善的线性系统不可避免地会导致高音梯度估计。 我们还举了一组稳定系统的例子,其中政策梯度方法受到维度的诅咒。我们的结果既适用于国家的反馈,也适用于部分观察的系统。