多目标下结构进化型再励学习方法及其实现技术研究

项目名称： 多目标下结构进化型再励学习方法及其实现技术研究

项目编号： No.60874047

项目类型： 面上项目

立项/批准年度： 2009

项目学科： 无线电电子学、电信技术

项目作者： 赵金

作者单位： 华中科技大学

项目金额： 30万元

中文摘要： 现代工程应用系统正朝着大规模、复杂化的方向发展，基于传统控制理论与方法往往难以获得合适的控制器以满足系统性能指标的要求，为了实现复杂、未知系统控制器的自主设计，本项目从多目标优化算法、再励学习收敛速度和精度、神经网络结构进化策略及基于进化硬件的神经网络动态重构方法等四个方面展开了深入研究，取得了一系列的成果。在多目标优化算法方面，提出了一种带个人偏好的多目标优化算法，能够根据设计者的偏好优化得到不同性能指标的控制器；在再励学习收敛速度方面，提出了一种基于贝叶斯估计的神经网络权重初始化方法，该方法能够保证神经网络初始化后具有一定的控制能力；在再励学习收敛精度方面，提出了一种基于同时扰动随机逼近和遗传算法相结合的方法，不仅能够在局部提高学习速度，而且还能提高算法逼近精度，改善控制器性能；在神经网络结构进化策略方面，提出了一种自结构模糊神经网络控制算法和一种基于相关性的网络结构剪枝算法，两种算法均能实现网络结构和权重二者的同时进化；在基于进化硬件的神经网络动态重构方面，提出了一种在FPGA内部实现神经元功能的坐标旋转数字计算机算法，该算法依靠移位和求和能够实现精确的指数函数计算。

中文关键词： 再励学习；多目标优化；神经网络；进化硬件；结构进化

英文摘要： With the development of modern application system which becomes large-scale and complicated, it isn't always possible to design controller to satisfy requirements of the system with traditional control theories. In order to design controller for unknown and complicated system automatically, this project researches on four aspects including multi-objective optimization algorithm, the convergence rate and precision of reinforcement learning, configuration evolutionary strategy of neural networks, and neural networks dynamic reconfiguration. In the multi-objective optimization aspect, it proposes a multi-objective optimization algorithm which designs different controller according to the personal preference automatically. In the convergence rate of reinforcement learning aspect, it proposes an Bayesian estimation based initialization algorithm for neural network, which makes the neural network have basic control ability after initialization. In the approximation precision aspect, it proposes a hybrid algorithm which combines simultaneous perturbation stochastic approximation with genetic algorithm. The hybrid algorithm doesn't only accelerate the convergence rate of reinforcement learning, but also improves the approximation precision of reinforcement learning. In the configuration evolutionary strategy of neural networks, it proposes a self-configuration algorithm of fuzzy neural network and a pruning algorithm based on correlation of the nodes. Both algorithms can evolve the configuration and weights of the neural networks. In the neural networks dynamic reconfiguration aspect, it proposes a new direct coordinate rotation digital computer (CORDIC) algorithm which used to form the neuron of neural network based on FPGA platform. The CORDIC algorithm can achieve fast and accurate computation of exponential function relying on shift and summation.

英文关键词： Reinforcement learning; Multi-objective optimization; Neural network; Evolvable Hardware; Configuration evolution

成为VIP会员查看完整内容