项目名称: 基于NoSQL的海量太阳观测数据分布式存储技术的研究
项目编号: No.11263004
项目类型: 地区科学基金项目
立项/批准年度: 2013
项目学科: 数理科学和化学
项目作者: 季凯帆
作者单位: 昆明理工大学
项目金额: 64万元
中文摘要: 传统的集中式数据存储方式已经无法满足现代太阳观测每秒数百兆字节的高速海量数据存储要求。本项目以澄江红外太阳塔(NVST)为依托,研究基于NoSQL的分布式数据存储技术,实现海量太阳观测数据的高速安全可靠的存储、管理、检索、读取和维护,并满足数据动态快速的增长以及对存储数据的实时处理要求。项目重点开展基于NoSQL的天文数据分布存储架构、分布存储中的数据完整性保证技术、高性能并行读写技术和存储节点便捷加入与移除技术这四个方面的工作。力争突破基于Key-Value与B+树结合的数据存储与检索技术、分片存储的最优化方法和NoSQL中的实时任务分配与任务调度算法等关键技术。本项目创新之处在于引入新的存储体系结构,使用分布式存储和NoSQL技术实现海量太阳观测数据的高速可靠的存储、读取、管理和扩容,以及为保证数据的一致性和安全性而利用Key-Value机制B+树实现观测结果的一次性写入和结果查询。
中文关键词: 天文海量数据;分布式存储;太阳图像;非关系数据库;并行计算
英文摘要: Modern solar observations produce high-volume data with a very high speed such as a few hundred MB per second and several TB per day. However, the traditional centralized data storage technology has been unable to meet this demand in scalability, availability and performance. In this proposal, NoSQL distributed storage mechanisms are designed for handling the New Vacuum Solar Telescope in Chengjiang, Yunnan, China. Its goal is to provide a fast, secure and reliable way to store, manage, retrieve, read and maintain massive amounts of solar observation data. The requirements of the data dynamic growing in volume and real-time processing will be met. There are four fields that will be researched, the architecture of NoSQL distributed storage, the techniques of distributed data integrity, the technology of high-performance parallel data reading/writing, and how to add / remove storage nodes conveniently. Key technologies include data storage and retrieval using the combination of Key-Value and B + tree, the optimization methods of distributed data storage, and the algorithm of real-time task allocation and task scheduling with NoSQL. The innovative points of the proposal are to use the distributed storage and NoSQL mechanisms in storing and reading massive solar observation data, and the use of Key-Value and B
英文关键词: astronomical mass data;distributed storage;solar images;NoSQL database;parallel process