项目名称: 基于"非监督-监督-激励"集成学习模式的机器人行为自主学习系统研究
项目编号: No.61075096
项目类型: 面上项目
立项/批准年度: 2011
项目学科: 金属学与金属工艺
项目作者: 李军
作者单位: 重庆大学
项目金额: 10万元
中文摘要: 本项目对"非监督-监督-激励"集成学习模式中的核心部分进行了深入研究。首先,我们提出了基于GWR-RBF的自生长自消减构造性神经网络结构。通过对GWR(Growing When Requried)“#24863;知-动作”#25968;据缓存窗口和遗忘因子的引入、对RBF高斯基函数的简单二次函数逼近、以及对神经网络学习率阈度的动态调节,我们实现了针对所需机器人行为在网络结构稳定和计算量小的前提下“#38750;监督-监督”#27169;块的实时在线学习算法。其次,我们研究了激励学习模块的“ff-Policy”#30340;嵌入方式,即把Q-学习中的行为策略和学习策略分为两个通道,实现了“#19968;步式Q-学习”#31639;法先验知识的在线嵌入,从而使Q-学习算法能用于操作者的示教学习和机器人行为实时在线的优化。最后我们基于PeopleBot和Khepera-II移动机器人平台对我们提出的非监督-监督,和激励学习模式分别进行了验证,同时还完成了PeopleBot人机交互接口软件的二次开发平台设计与实现。这些成果已部分地反映在我们出版的一本英文专著和两篇EI论文中。
中文关键词: 非监督-监督-激励"学习;自生长自消减构造性神经网络;机器人行为;实时在线优化;示教学习
英文摘要: This project investigates the core parts of the integration of the "nonsupervised-supervised-reinforcement" learning paradigms. We first address the constructive neural networks based on growing and prunning mechanism of the GWR-RBF networks.Online realtime learning of the required behaviors is implemented based on the introdution of the sliding cache windows and the forgetting factors to the GWR network (Growing When Required),and based on the quadratic approximation of Gaussian funciton and the dynamic adjustment of the learning rate of the networks. We then present an "one-step" Q-learning algorithm for online embedding of the priori knowledge, in which the "off-policy" is used for dividing the learning into estimation policy and behavior policy, resulting in the usage of PbD approach (Programming by Demonstration) for robot behavior optimization. We finally carry out the software design of the human-machine interface on two real robots named PeopleBot and Khepera-II. the above achievements are partially presented in two papers and one work written in English.
英文关键词: nonsupervised-supervised-reinforcement learning; growing-prunning neural networks; robot behavior; online realtime optimization; programming by demonstration