项目名称: 海量高维天体光谱数据挖掘及其并行化研究
项目编号: No.61272263
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 张继福
作者单位: 太原科技大学
项目金额: 80万元
中文摘要: 本项目针对国家重大科学工程LAMOST项目的三大科学任务,围绕拟解决的关键科学问题"寻找具有示踪性质的特殊天体以及宇宙未知规律的认识",对海量高维天体光谱数据挖掘及其并行化技术进行研究,主要研究内容包括:基于子空间和属性相关性的海量高维局部离群数据挖掘算法及其并行化;基于地址表和FIUT树结构的频繁模式挖掘算法及其并行化;面向天体光谱数据的关联规则约简与表示;集群环境下的数据挖掘算法性能优化与负载均衡;海量高维天体光谱离群数据并行挖掘技术;基于关联规则的海量高维天体光谱数据相关性分析并行化技术;Hadoop环境下的天体光谱数据挖掘系统。该课题的研究不仅为海量高维数据挖掘,拟提出一种有效的方法和途径,而且也可望为进一步提高LAMOST的科学产出,实现未知特殊天体光谱数据和天文规律的知识发现,以及未知特殊天体的交叉认证,提供核心支撑技术。
中文关键词: 海量高维数据挖掘;天体光谱;频繁模式;局部离群数据;并行化
英文摘要: Aiming at three major science tasks of the great national science engineering project LAMOST,this project makes research on massive and high dimensional data mining of celestial spectra data and its parallel technology around the key scientific issue which is "Looking for the special celestial body with tracing properties and the universe awareness of unknown laws". The main research work is as follow: Local outlier mining algorithm of massive and high dimensional data set and its parallelization based on subspace and attribute correlation analysis; Frequent pattern mining algorithm and its parallelization based on address table and FIUT tree structure; Celestial body spectrum-oriented reduction and representation of association rule; Performance optimization of data mining algorithms and load balancing under the cluster environments; Parallel oulier mining technology of massive and high dimensional celestial spectra ; The parallelization technology of correlation analysis of massive and high dimensional celestial spectra data based on association rule; Celestial spectra data mining system under Hadoop environments. The research on the project not only affords effective ways and means for massive and high dimensional data mining, but also further provides core support technologies for improving the scientific o
英文关键词: Massive and High Dimensional Data Mining;Celestial Spectrum;Frequent Pattern;Local Oultier;Parallelization