To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms which do not consider the structural properties of the optimal policy, we propose a structure-aware learning algorithm to exploit the ordered multi-threshold structure of the optimal policy, if any. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction in the policy space, the proposed algorithm provides remarkable improvements in storage and computational complexities over classical RL algorithms. Simulation results establish that the proposed algorithm converges faster than other RL algorithms.
翻译:为解决解决Markov决策程序(MDP)问题的动态程序(DP)方法的维度和建模的诅咒,在实践中采用了强化学习(RL)方法,与不考虑最佳政策结构特性的传统RL算法相反,我们提议了一个结构认知学习算法,以利用最佳政策(如果有的话)的定购多门槛结构。我们证明,拟议的算法与最佳政策无足轻重的趋同。由于政策空间的缩小,拟议的算法对传统的RL算法的储存和计算复杂性作出了显著改进。模拟结果证明,拟议的算法比其他RL算法的趋同速度要快。