面向大规模多步学习问题的学习分类元系统技术研究

项目名称： 面向大规模多步学习问题的学习分类元系统技术研究

项目编号： No.61502274

项目类型： 青年科学基金项目

立项/批准年度： 2016

项目学科： 其他

项目作者： 臧兆祥

作者单位： 三峡大学

项目金额： 20万元

中文摘要： 多步学习问题的求解是强化学习研究的主要问题之一，在未知环境下的机器人路径规划、计算机游戏智能、控制调度等领域有着重要和广泛的应用。学习分类元系统(Learning Classifier Systems, LCSs)对多步学习问题的求解展现出了应用价值，但其难于求解大规模的这类问题。为此，本项目通过研究大规模学习问题难于求解的主要原因，来构建LCSs在这类问题中的求解机制。具体内容包括：研究LCSs中现有的折扣奖赏强化学习算法对其性能的限制和阻碍作用，并通过将其置换为多种基于平均奖赏的强化学习算法，来提升LCSs对动作长链的支持能力；为LCSs构建有效的记忆机制来应对大规模学习问题具有的非马尔科夫特性；分别从典型的函数逼近方法和基于LCSs自身结构特点和泛化能力优势发展而来的广义分类元系统这两个方面，来求解具备连续状态和动作空间的多步学习问题。本项目的研究可为相关应用提供理论和技术基础。

中文关键词： 多步学习问题；强化学习；学习分类元系统；平均奖赏；连续空间

英文摘要： To solve multi-step problem is one of the main research field of reinforcement learning. It has important and wide-range application in the field of robot navigation in unknown environments, computer game AI, control, and so on. As a genetics-based machine learning technique, learning classifier systems (LCSs) has shown promise on solving multi-step problems, but they have difficulties in solving large multi-step problems. This project tries to analyze the reasons behind the difficulties, and develop the solving mechanisms for LCSs in large multi-step problems. The concrete contents include: to study the performance limitations resulting from the discounted reward reinforcement learning algorithms within LCSs, and then replace them by some average reward reinforcement learning methods to support long action chains in large multi-step problems; to develop an effective memory mechanism for LCSs to cope with no-Markov problems, in order to improve the effectiveness and robustness of LCSs in these problems; to build some LCSs which can address multi-step problems with continuous state and action space, by using some typical function approximation methods and Generalized Classifier System based on LCSs' special structural features and generalization ability. The results of this study can provide theoretical and technical basis for the application of LCSs in related fields.

英文关键词： Multi-step learning problems;Reinforcement learning;Learning classifier system;Average reward;Continuous space

成为VIP会员查看完整内容