项目名称: 大数据环境下的数据查询隐私保护技术研究
项目编号: No.61472131
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 秦拯
作者单位: 湖南大学
项目金额: 82万元
中文摘要: 防范敏感信息泄露是数据查询及分析的基本要求。大数据时代,数据存储结构、内在关系及访问方式均发生了较大变化,在大数据查询中,现有数据隐私保护技术往往难以有效应用。本项目拟基于大数据环境的数据动态存储模式、数据价值密度稀疏等特征,针对查询过程中两类主要隐私泄露问题:直接窃取隐私和基于相关性的隐私推断,研究大数据环境下数据查询隐私保护技术。针对直接窃取隐私,对大数据的存储结构和访问方式等进行分析和建模,设计敏感信息隐藏的PP-Tree 索引结构,研究基于该索引结构的信息隐藏算法,实现对海量异构原始敏感数据的隐私保护;针对基于相关性的隐私推断,分析大数据的隐含关系特征,研究其隐含相关性量化方法并建模,提出基于大数据隐含相关性的隐私保护方法,构建大数据查询引擎,对相关模型和算法进行实验评估和优化。本项目的研究,将为大数据查询中隐私保护提供理论方法和技术手段,对推动大数据健康快速发展具有重要意义。
中文关键词: 大数据;隐私保护;索引;隐含相关性
英文摘要: Privacy protection is a crucial requirement in many data query and analysis scenarios. Previous works on privacy protection are generally inapplicable for big data applications because the storage structure, the access scheme, and the relational connections among data for big data are substantially different from traditional data applications. Based on the characteristics of big data environment,such as dynamic storing model and sparse value density, this project aims to investigate techniques that protect privacy for big data queries against two malicious behaviors: direct privacy stealing and privacy inference based on data correlations. To protect privacy against direct privacy stealing, this project proposes to analyze and model storage structures and query schemes of big data, design a privacy-preserving tree (PP-Tree) index and corresponding indexing algorithms to hide sensitive information, and then protect privacy for large-scale heterogeneous raw data. To protect privacy against inference attacks, this project proposes to analyze characteristics of implicit correlations among big data, provide quantitative model for big data correlations, and then design privacy protection techniques for big data correlations. Second, this project also proposes to design privacy-preserving search engine for big data, evaluate and optimize the proposed models and algorithms. Last but not least, it aims to provide comprehensive privacy protection techniques for big data queries. The outcome of this project will be crucial for most big data applications.
英文关键词: Big data;Privacy protection;Index;Implicit correlation