项目名称: 支持海量非结构数据可视化分析的存储与索引
项目编号: No.61070051
项目类型: 面上项目
立项/批准年度: 2011
项目学科: 武器工业
项目作者: 钱卫宁
作者单位: 华东师范大学
项目金额: 11万元
中文摘要: 非结构化数据的可视化分析是"大数据"环境下数据利用的关键问题之一。本项目针对这一问题中数据量大、结构信息不完整、可视化要求即时性和交互性的特点,从数据管理的角度,以存储和索引对可视化分析所需要的相似性、聚集、交互式查询的支持为切入点,研究了1)面向可视化分析的非结构化数据语义建模;2)支持可视化分析的非结构化数据查询操作和查询语言;3)支持交互式查询的海量非结构化数据的分布式存储与索引;4)支持即时分析的非结构化数据统计量分布式维护等关键技术,并基于真实的海量非结构化数据(2TB微博数据),开发了用户集群行为分析可视化原型系统。在课题资助下,课题组成员在国际学术会议或学术期刊发表论文8篇(包括ICDE 2012会议论文1篇),申请专利1项,申请软件著作权1项,完成非结构数据分析标准测试集1套,培养硕士生2名,在国际学术会议DASFAA 2011上获得了Best Demo Award Runner-Up,在国际学术会议SocInfo 2011上获得Best Poster Award。项目研究内容和技术路线与预先设定相符;项目成果达到了项目任务书要求;项目管理和经费使用符合相关规定。
中文关键词: 非结构化数据; 海量数据管理; 存储与索引; 可视化分析
英文摘要: Visualized analysis of unstructured data is a key issue for taking full advantage of Big Data. This project aims at the challenges of huge volume and missing structure information of data, and requirements of on-demand and interactive visualization. To support the similarity, clustering, and interactive query processing for visualized analytics, storage and indexing methods for massive unstructured data are studied. Research results include: 1) semantic modeling for unstructured data; 2) query operator and query language for visualized analytics; 3) distributed storage and indexing over unstructured data for interactive query processing; 4) distributed statistics maintenance for on-demand analytical queries. Based on real-life massive unstructured datasets (2TB microblog data), a visualized analysis prototype system for collective bahavior research is implemented. Under the support of the project, eight research papers are published, including ONE ICDE 2012 full research paper, one patent and one software copyright applications are submitted. The project members have developed a set of benchmark for unstructured data analysis. The project members have been awarded the DASFAA 2012 Best Demo Award Runner-Up and SocInfo 2012 Best Poster Award. The research work is consistent to the pre-defined project tasks. The management of the project follows the rules of NSFC.
英文关键词: Unstructured data; massive data management; storage and indexing; visualized analysis