项目名称: 保护隐私的海量数据挖掘
项目编号: No.61202427
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 桑应朋
作者单位: 中山大学
项目金额: 25万元
中文摘要: 信息技术的飞速发展已经使得许多行业和部门积累了海量的数据。在不泄漏商业机密、用户隐私的条件下对这些海量数据进行挖掘,是信息共享、知识发现等实践应用中亟待解决的核心问题之一。现有研究的主要不足之处在于对海量数据变换时效率过低、挖掘时没有充分考虑数据变换带来的不确定性。本项目针对数据海量、异构的特点,为数据发布者和挖掘方提出系统的、创新性解决方案。对于数据发布者,本项目将提出兼具高效率、高安全、高数据可用性的隐私保护变换;对于数据挖掘方,本项目将研究新型数据挖掘方法,这些方法可适用于单一挖掘和联合挖掘,可调和隐私保护变换给数据带来的不确定性,可防止恶意参与者对挖掘过程的破坏,也将配备基于MapReduce的并行机制。本项目研究成果可高度适用于急需信息共享却受隐私保护法规限制的行业部门,如金融保险业、电信、医疗等,将有效促进不同行业间信息流通,带来可观的经济和社会效益。
中文关键词: 数据隐私保护;数据挖掘;异构海量数据;;
英文摘要: With the rapid development of information and communications technology,tera-scale data has been accumulated inside various governmental and private sectors. It becomes a critical problem in the applications of information sharing and knowledge discovery to conduct tera-scale data mining without leaking commercial secrecy and individual privacy. The major shortcomings of the state-of-the-art research are low efficiency in transforming tera-scale private data, and failure to consider the uncertainty produced by the transforms. This project aims at the tera-scale and heterogeneity of the private data, proposes systematic and novel solutions for both data publishers and miners. For the data publishers, the project will provide privacy-preserving transforms with high efficiency, high security, and high data utility. For the data miners, the project will provide new data mining approaches. These approaches will be suitable for single miner and federated miners, be able to leverage the uncertainty generated by the privacy-preserving transforms. They can thwart the sabotage on mining process by malicious participants, and are also coupled with parallel mining framework based on MapReduce. The outcomes of the project will be especially applicable to those sectors in highly demand of information sharing while restricted
英文关键词: Data privacy protection;Data mining;Heterogenous and Tera-scale data;;