项目名称: 面向网络舆情分析的非确定性数据管理方法研究
项目编号: No.61202214
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 张铁赢
作者单位: 中国科学院计算技术研究所
项目金额: 25万元
中文摘要: 随着网络舆情应用深入到国计民生的各个领域,舆情分析所涉及到的非确定性数据呈爆炸性增长,已超过PB级。然而,目前的确定性数据管理方法满足无法舆情分析的需求,如分析功能不全、分析效率低下,从而无法准确地判断、分析和预测舆情动态。本项目结合舆情分析需求,充分利用云计算和云存储技术,全方位地研究面向舆情分析的大规模非确定性数据管理基础理论和方法,重点研究非确定性数据模型和数据集成方法、非确定性数据分布式存储策略和动态索引机制、非确定性数据并行查询处理算法与机制、非确定性数据对象缓存机制四个关键部分。项目研究成果不仅能解决舆情分析中遇到的非确定性数据管理瓶颈问题,还对构建大规模网络舆情数据中心有重要的方法论意义,同时对其它学科中出现的非确定性数据问题具有一定的借鉴意义。
中文关键词: 并发控制;数据模型;分布式数据库;图数据管理;查询解析
英文摘要: With the development of pubilic opinion application, the uncertain data for public opinion analysis is growing rapidly and has exceeded PetaBytes. However, the current data management method for certain data could not meet the requirements of public opinion analysis due to its uncomplete functions and low efficiency. Thus, the traditional method could not analyze accurately. This project takes the requirement of public opinion into full consideration and takes full advantage of cloud computing and cloud storage in order to study the uncertain data management methods and basic theories for public opinion analysis. We focus on the uncertain data model and integration, the distributed uncertain data storage method and dynamic index mechanism, parallel query algorithm and object cache scheme for uncertain data. This project aims to not only solve the bottleneck problem of uncertain data management in public opinion analysis but also provide the method to build large scale public opinion data center. Meanwhile, this project would represent a useful effort for the problem of the uncertain data management in other fileds.
英文关键词: concurrency control;data model;distributed database;graph data management;query analysis