We consider the problem of nonlinear stochastic optimal control. This problem is thought to be fundamentally intractable owing to Bellman's "curse of dimensionality". We present a result that shows that repeatedly solving an open-loop deterministic problem from the current state with progressively shorter horizons, similar to Model Predictive Control (MPC), results in a feedback policy that is $O(\epsilon^4)$ near to the true global stochastic optimal policy, where $\epsilon$ is a perturbation parameter modulating the noise. We also show that the optimal deterministic feedback problem has a perturbation structure such that higher-order terms of the feedback law do not affect lower-order terms and that this structure is lost in the optimal stochastic feedback problem. Consequently, solving the Stochastic Dynamic Programming problem is highly susceptible to noise, even in low dimensional problems, and in practice, the MPC-type feedback law offers superior performance even for high noise levels.
翻译:暂无翻译