项目名称: 大数据环境下的空间聚类方法研究
项目编号: No.41301402
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 天文学、地球科学
项目作者: 付艳
作者单位: 北京师范大学
项目金额: 25万元
中文摘要: 空间聚类是空间数据挖掘的一个重要分支,目的是从空间数据库中发现隐藏的模式或识别出相似的地域。随着空间信息处理技术的高速发展,空前规模的大数据势必给聚类研究带来新的挑战。本项目的研究面向大数据环境下的空间聚类方法展开,内容包括:(1)利用并行计算框架MapReduce,首次将现有聚类算法移植到分布式计算平台,以满足空间大数据的科学计算需求,也为更大规模、更高维度的空间数据及时空数据聚类研究提供了基础;(2)首次利用LDA模型设计了在线的空间聚类算法,不仅满足了大数据的并行计算需求,还解决了维度高、噪声多等问题,为时空信息挖掘及趋势预测等提供了研究基础。在大数据环境下,先进的数据挖掘技术可以让空间领域的科研工作者更轻松地探索数据的时空模式,不断深化对地球系统复杂演变过程的理解。而本项目的研究内容对于更全面地发现空间数据属性间存在的潜在联系和变化规律,具有重要的理论意义和应用前景。
中文关键词: 空间大数据;聚类;分布式计算;深度学习;迁移学习
英文摘要: Spatial clustering is an important part of spatial data mining. The goal of spatial clustering is to find hidden pattern or similar regions from spatial databases. With the high speed development of information technique for spatial data, big data will undoubtedly bring many new challenges to the spatial clustering research. The proposal aims to deal with spatial clustering on big data, and details are listed as follows: (1)Based on the parallel computation framework, MapReduce, we firstly try to transfer existing spatial clustering methods to the distributed computing platform, and check whether they would work. This transformation is to satisfy the requirement of the scientific computing on big data. And, if this works, the output could be used to support the spatial and temporal-spatial clustering research on a data set with bigger size and higher dimension. (2)We firstly introduce LDA model to propose an on-line spatial clustering method. The method could satisfy the performance requirement from big data processing, and it also avoid some drawbacks caused by spatial data, like high dimension and lots of noise. This work will definitely support the research on temporal-spatial mining and trend prediction. With big data, good data mining techniques could help researchers explore temporal-spatial patterns easil
英文关键词: spatial big data;clustering;distributed computing;deep learning;transfer learning