项目名称: 多维数据布鲁姆过滤器的理论与技术
项目编号: No.61472194
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 钱江波
作者单位: 宁波大学
项目金额: 80万元
中文摘要: 数据过滤技术能够从静态或动态的海量数据中快速提纯出有价值的数据做进一步处理,它是当前数据爆炸时代非常有效的工具。尽管单维数据过滤器已经研究和应用多年,但针对多维数据过滤器的研究还不多,且主要集中在低维数据的集合判断问题。基于数据管理技术的前瞻性考虑,项目以多维数据为处理对象,探索高性能数据过滤器的理论和实现技术,为大数据处理领域提供先进、实用的解决方案。研究内容包括:(1)提出低维数据布鲁姆过滤器关联删除概念和方法,该方法也能用于半连接、窗口更新等其它操作;(2)针对高维数据的不同过滤粒度,提出多粒度距离敏感布鲁姆过滤器方案;(3)结合硬件和并行计算的高效性,设计实现可用于数据处理前端流水线加速的硬过滤器;(4)设计实现后端数据处理服务器的基于MapReduce的批处理和流水线加速方法。该项研究具有原创性,对提高数据处理速度,拓展数据管理技术的理论和方法有重要的理论和现实意义。
中文关键词: 多维数据;布鲁姆过滤器;并行计算;硬件加速;距离敏感哈希函数
英文摘要: With data filtering technology, valuable data can be fast purified from static or dynamic big data for further processing. This technology is a very effective tool in the current era of data explosion. Although the data filters for single dimension data have been researched and used for many years, the research on filters for multi-dimensional data is being seldom studied, even the minor contribution is mainly from the judgment of belonging to a low-dimensional data set. Focusing on processing multi-dimensional data, we propose some new theories and implementation techniques for high-performance data filters. The study includes: (1) We propose Bloom filter based associative deletion theory and algorithms for low-dimensional data. This theory can also provide direct calculation method for many other operations, such as semi-join, update of sliding-window, etc. (2) We propose theories and algorithms of multi-granularity locality-sensitive Bloom filter for high-dimensional data. (3) We propose a new hardware coprocessor using pipeline acceleration for filtering in front-ends of data processing. (4) We propose batch processing and pipeline processing methods in the MapReduce framework for filtering acceleration in a back-end data processing server. The study is a project of originality and will contribute significance theories and techniquies for data processing.
英文关键词: multi-dimensional data;Bloom filter;parallel computing;hardware acceleration;locality-sensitive hashing