项目名称: 面向大规模分布式内存的非结构化数据管理系统关键技术研究
项目编号: No.61300003
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 陈薇
作者单位: 北京大学
项目金额: 23万元
中文摘要: 非结构化数据在数字世界中所占的比例高达90%,对其有效管理和分析在各个领域都显示出迫切需求。现有云计算平台对非结构化数据高吞吐量的处理同时却伴有系统反应时间的高延迟,无法满足在线随机访问、交互式分析和挖掘的需求。如何实现对海量非结构化数据有效管理,在系统设计方面有许多理论上尚未解决的问题,能否在大规模分布式内存中实现,已引起数据库研究界和工业界的热切关注。本项目将针对非结构化数据异构、关联、实时的特点,深入研究面向大规模分布式内存的非结构化数据存储访问模型和分布式内存环境中并行数据计算处理框架。设计一种面向大规模分布式内存的非结构化数据管理系统,实现基于Distributed In-Memory Data Storage的非结化数据低延迟高吞吐量访问服务模式。本项目成果将在北大研制的"海量非结构化数据管理分析系统LUDAS"中实现,并在海量真实环境(100台服务器集群、PB级数据)验证。
中文关键词: 非结构化数据管理;内存计算;RDF;数据流优化;
英文摘要: The urgent needs and promising prospects of effective management and analysis of big data, which accounted for 90% proportion of digital world, have been shown in most fields. While achieved the ability of large-scale high-throughput data processing, existing cloud computing platform suffered from high system response time latency. Therefore, they are unable to meet the needs of online random access, interactive analysis and mining for massive data. From system design perspective, there are many theoretically unresolved issues on how to achieve effective management of unstructured massive data. It has attracted enthusiastic attention in both database research community and industry field whether such problem could be solved in large-scale distributed memory environment. This project will address the heterogeneous, associated, real-time features of unstructured data, study in-depth on large-scale distributed in-memory unstructured data storage model and distributed in-memory parallel data computation framework. Our goal is to design a large-scale distributed in-memory unstructured data management system, realize a low-latency high-throughput unstructured data access and service model based on distributed in-memory data storage. The achievements of this project will be implemented in "Large-Scale Unstructured Data
英文关键词: Unstructured data management;In-memory computing;RDF;Data flow optimization;