Markov决策过程值函数逼近的基函数自动构造

项目名称： Markov决策过程值函数逼近的基函数自动构造

项目编号： No.61273143

项目类型： 面上项目

立项/批准年度： 2013

项目学科： 自动化技术、计算机技术

项目作者： 程玉虎

作者单位： 中国矿业大学

项目金额： 80万元

中文摘要： 强化学习是求解模型未知的Markov决策问题的有效方法。对于基于线性值函数逼近的连续空间强化学习来说，基函数的合理构造将直接影响Markov决策过程（MDP）值函数的逼近精度，进而影响强化学习方法的性能。为此，本项目拟利用图论的分析思想和方法，研究MDP值函数逼近的基函数自动构造方法。内容包括：为体现动作之间的差异性和全面描述MDP环境的基本拓扑结构，构建连续空间状态-动作图；为提高MDP值函数的逼近精度和泛化能力，研究状态-动作图上的基函数自动构造方法；为减小计算和存储代价，提高MDP值函数逼近的学习效率，设计面向稀疏化的基函数自动选择算法；将所提新型连续空间强化学习方法用以解决倒立摆平衡控制、电梯群组调度、机器人自主导航等典型Markov决策问题以验证其可行性和有效性。研究成果不但可以将强化学习方法的应用领域扩大到连续空间，而且可以进一步深化和丰富现有的强化学习理论。

中文关键词： 强化学习；马尔可夫决策过程；值函数；图论；迁移学习

英文摘要： Reinforcement learning is an effective method for solving Markov decision problems with unknown model. For reinforcement learning in continuous space based on linear value function approximation, the reasonable construction of basis functions influences the approximation accuracy of value function for Markov decision process (MDP) and further influences the performance of reinforcement learning methods. Therefore, the automatic construction method of basis functions for MDP value function approximation will be researched using the analysis idea and method of graph theory in the project. The main contents in our study include the following aspects. In order to embody the discrepancy between actions and to describe comprehensively the basic topology structure of MDP environment, a building method of a state-action graph for continuous space is proposed. In order to improve approximation accuracy and generalization ability of MDP value function, an automatic construction method of basis functions defined on the state-action graph is proposed. In order to decrease the computational and storage costs and to improve the learning efficiency of the MDP value function approximation, a sparsity-oriented automatic selection algorithm of basis functions is designed. In addition, the proposed new reinforcement learning metho

英文关键词： Reinforcement learning；Markov decision process；Value function；Graph thoery；Transfer learning

成为VIP会员查看完整内容