项目名称: 天文大数据中时序图像子集高效检索方法与系统研究
项目编号: No.U1531111
项目类型: 联合基金项目
立项/批准年度: 2016
项目学科: 天文学、地球科学
项目作者: 于策
作者单位: 天津大学
项目金额: 47万元
中文摘要: 大数据时代的天文学研究,天文科学数据量空前丰富且增长迅速,高效获取所需要数据子集成是开展具体研究的前提。本项目以基于图像数据的时域天文学为主要需求,以高性能存储、索引等技术为基础,以实际观测数据和科学问题为范例,研究在大数据环境中高效提取时序天文图像子集的方法和可扩展系统构架。研究工作的主要内容和创新在于:在不对已经归档的图像数据进行任何改变的前提下,构建高性能元数据结构和索引系统,用于定位所需数据所在的文件;研究高效FITS图像文件局部数据读取方法,以最少代价读取局部图像数据;研究设计高性能缓存系统,通过数据布局优化和数据生命周期优化管理提高检索得到的时序图像数据的访问性能和使用效率;研究设计针对不同规模数据集的系统构架,支持单存储节点环境与分布式存储环境。研发成果可以直接服务于需要时序图像集合的天文学研究以及大规模天文图像数据管理,也可以为其他类型天文科学数据管理和高性能检索提供参考。
中文关键词: 时域天文;数据管理;天文图像数据;索引;数据存储
英文摘要: Astronomical research is already in big data era as the astronomical science data is so rich and keep increasing dramatically. High efficiency subset retrieving is the prerequisite of the specific research. This project focuses on the time domain astronomy topics based on image data to research on the high efficiency subset retrieving methods and scalable system architecture for astronomical image data. The research work is based on high performance storage and indexing technologies and uses actual observed data and science topics as samples. The main contents and innovations include the following. A high performance metadata and indexing system constructed without any modification on the archived raw image data to locate the files needed, a high efficiency subset reading methods designed for FITS image file to retrieve the specific area of the large image with least effort, a high performance cache designed to store the queried time series image data. The optimization focuses on the data layout and data lifecycle management. The system architecture for different scale of data set will support both single node and distributed storage environment. The expected output of this project can directly support the astronomical research on time series image data and the management of massive astronomical image data, and will be a sample for the management and high performance retrieving of other types of astronomical science data.
英文关键词: Time Domain Astronomy;Data Management;Astronomical Image Data;Index;Data Storage