项目名称: 面向大规模数据挖掘的隐私保护支持向量机增量与并行学习算法研究
项目编号: No.61202152
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 段华
作者单位: 山东科技大学
项目金额: 24万元
中文摘要: 隐私保护是当前数据挖掘领域中一个重要的研究课题,其中隐私保护支持向量机(PPSVM)日益受到关注。由于PPSVM所处理的数据集既要保密又要反映真实情况,对PPSVM的算法研究与一般SVM不尽相同。为提高大规模数据集的学习效率,本项目针对PPSVM的增量学习算法以及学习算法的并行化问题进行研究。首先研究大规模数据集有效加密措施,形成一个局部独立、整体协作的数据集。其次,构造PPSVM可行的求解算法,尤其是针对大规模的数据集探讨SOR方法在PPSVM求解中的应用。再次,引入PPSVM增量学习技术,减少存储空间,提高训练速度;最后,研究PPSVM的并行算法,使之能快速地求解超大规模的分类问题。本项目将建立上述问题的理论基础,研究模型解的等价性、各算法的收敛性。研究PPSVM算法的参数优化选取,使PPSVM算法的性能达到最优。项目研究结果将在银行、保险等行业中的予以应用验证。
中文关键词: 支持向量机;隐私保护;大规模数据挖掘;增量学习;流程挖掘
英文摘要: Privacy protection is an important research topic in the field of data mining, and the Privacy Preserving Support Vector Machine (PPSVM) obtains more and more attention. The data set processing by PPSVM is necessary to keep confidential but also to reflect the true situation, so the algorithm research of PPSVM is different to that of the general SVM. To improve the learning efficiency of large-scale data sets, the project mainly focuses on the incremental and parallel learning algorithms for PPSVM towards privacy protection. The first problem is to address the effective encryption of large data sets, so as to obtain an independent locally data set with an overall coordination. Secondly, the feasible solution algorithms are constructed for PPSVM. Especially for large-scale data set, the SOR method is introducted for solving the model of PPSVM. The third problem focuses on the incremental learning algorithms for PPSVM so as to improve the training speed and reduce storage space. Finally, the parallel learning algorithms are constructed for PPSVM so that it can quickly solve large-scale classification problems. The goal of the project is to establish the theoretical basis of the above problems and to prove the equivalence of solutions of the model and the convergence of the algorithm. Another goal is to to discuss
英文关键词: Support Vector Machine;Privacy Protection;Large-scale Data Mining;Incremental Learning;Process Mining