基于重要性采样的并行离策略强化学习方法研究

项目名称： 基于重要性采样的并行离策略强化学习方法研究

项目编号： No.61502329

项目类型： 青年科学基金项目

立项/批准年度： 2016

项目学科： 其他

项目作者： 傅启明

作者单位： 苏州科技大学

项目金额： 21万元

中文摘要： 近年来，离策略强化学习方法逐渐成为强化学习领域的一个研究热点。相对于在策略强化学习方法，从理论分析的角度，离策略方法的收敛性分析更为复杂，然而从实际应用的角度，离策略方法的研究将极大推动强化学习在实践中的应用。本项目主要围绕近似离策略强化学习理论及应用展开研究，工作主要分为以下四个方面：1）利用带权重要性采样方法构建可用于处理离策略样本数据的值函数参数更新规则，提出一种基于带权重要性采样的离策略强化学习算法；2）从理论上证明所提出参数更新规则能够保证离策略评估与在策略评估的一致性；3）结合所提出的离策略强化学习算法，构建一种可用于实时控制的并行离策略强化学习框架；4）将所提出的并行离策略强化学习框架用于实际的建筑节能问题，求解最优节能策略，实现建筑内相关设备的实时在线控制。因此，通过上述研究，将在一定程度上促进强化学习理论的发展，同时有效地解决离策略强化学习方法在实践中的应用难题。

中文关键词： 强化学习；离策略；重要性采样；函数近似

英文摘要： Recently, off-policy reinforcement learning has been a focus in reinforcement learning field. In contrast to the on-policy reinforcement learning, from the perspective of the theory, the convergence analysis of off-policy reinforcement learning methods is more complicated, but for the application, the research of off-policy reinforcement learning methods will promote the application of reinforcement learning greatly in practice. The project focuses on the research about the theory and application of approximate off-policy reinforcement learning, which mainly can be divided into the following four parts: 1) combined with the weighted importance sampling method, construct a novel parameter update rule under off-policy case and propose the off-policy reinforcement learning algorithm; 2) prove the consistence of parameter update rule under the on-policy case and off-policy case theoretically; 3) Based on the proposed off-policy reinforcement learning algorithm, construct a parallel off-policy reinforcement learning framework for real-time control problems; 4) apply the proposed parallel off-policy reinforcement learning framework to the construction conservation problem, seek the optimal policy and control the related equipments of the building online and in real time. Therefore, the above researches will promote the development of reinforcement learning theory to a certain extent and solve the application difficulties of off-policy reinforcement learning in practice efficiently.

英文关键词： Reinforcement Learning;Off Policy;Importance Sampling;Function Approximation

成为VIP会员查看完整内容