项目名称: 符号数据的聚类有效性分析与优化算法研究
项目编号: No.61305073
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 白亮
作者单位: 山西大学
项目金额: 26万元
中文摘要: 由于在人们的日常生活中存在着大量的符号数据(一种非数值型数据),如生物信息数据、Web数据和客户交易数据等,如何针对它们进行聚类分析已成为数据挖掘的一个重要研究问题,并引起了人们广泛关注。本项目将以符号数据作为研究对象,运用统计分析和优化方法,系统地对符号数据的聚类有效性及其相关的优化算法进行研究。主要研究内容包括:(1)符号数据的聚类准则选择和聚类算法互学习问题;(2)符号数据的聚类结果差异性度量和相关的优化问题;(3)在不同特征的符号数据集上对聚类算法表现的客观评价问题;(4)结合一两个具有明确生物意义的真实数据开展实验分析。本项目的研究成果将进一步丰富符号数据的聚类分析研究, 并为相关领域的数据挖掘与知识发现提供新的理论依据和技术支持。
中文关键词: 符号数据;聚类分析;聚类有效性;优化模型;优化算法
英文摘要: Due to the fact that a large collection of categorical data(a type of non-numerical data) exists in our lives, such as biological information data, Web data, customer transcation data, how to cluster categorical data have become an important issue in data mining, which have been concerned widely.In the project, we will take categorical data as a research subject and use the methods of statistical analysis and optimization theory to systematacially study the problems of its clustering validation and optimization algorithms. The main research contents are including: (1) The selection of clustering criteria and mutual learning between clustering algorithms; (2) The difference measures between clustering results from different data sets and its relevant optimization problems;(3)The appropriate evaluations for the performances of clustering algorithms on data sets with different characteristics; (4) The experimental analysis on some biological information data from the real world. The above mentioned contributions will further enrich the cluster analysis for categorical data, and provide new theoretical basis and technology support for the relevant studies.
英文关键词: Categorical data;cluster analysis;clustering validation;optimization model;optimization algorithm