项目名称: 分布估计学习关键问题研究
项目编号: No.61203305
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 自动化学科
项目作者: 樊建聪
作者单位: 山东科技大学
项目金额: 24万元
中文摘要: 分布估计算法(EDAs)是一种融合了概率密度估计理论和概率模型构建方法的新型演化计算方法,能够用于求解数据学习中的不确定、非线性和动态性等问题。本项目进行基于EDAs的数据学习关键问题研究,主要在EDAs用于分类学习的性能分析与评价、基于EDAs的无结构文本数据的分类学习、EDAs在海量和复杂数据域中的模式学习等三个方面进行研究,主要解决的关键问题包括:(1)EDAs应用于分类学习的有效性理论分析;(2)无结构文本数据集的变元提取及其概率模型的设计与构建问题;(3)基于EDAs的海量数据模式学习算法设计以及海量数据不同模式内与模式间的优化策略设计问题。本项目的研究意义是利用演化学习的概率基础和误差风险估计方法,进行分布估计学习的基础理论分析,实现基于EDAs的海量数据挖掘和文本模式发现,不但可以为复杂的云计算等新型信息技术提供服务,还能够充实和丰富从数据中学习分布模式的理论与方法体系。
中文关键词: 分布估计算法;数据挖掘;机器学习;聚类;
英文摘要: Estimation of distribution algorithms (EDAs) is an outgrowth of evolutionary computation, which is integrated with probability density estimation and probabilistic model-building. EDAs can be used to solve the learning problems with the common characteristics of uncertainty, nonlinear and dynamics of data. The purpose of this project is to research some key problems on learning from data based on EDAs, focusing on the performances analysis and evaluation of classification learning based on EDAs, classification of unstructured text data based on EDAs, pattern learning of massive and complex data domains based on probabilistic model estimation and building. Among the research topics, the key problems that have to be solved include: (1) theoretical analysis in validity and effectiveness of classification learning based on EDAs; (2) variable extraction and its probabilistic models building of unstructured text data sets; (3) design of EDAs-based pattern learning algorithms of massive data sets and optimal strategy analysis in the same pattern or among the different patterns. The significances of the project are to take advantages of probability basis and error estimation theory in evolutionary learning to analyze theoretically estimation of distribution learning, mine massive data sets and discover text patterns bas
英文关键词: Estimation of distribution algorithms;Data mining;Machine learning;Clustering;