Personally identifiable information (PII) can find its way into cyberspace through various channels, and many potential sources can leak such information. Data sharing (e.g. cross-agency data sharing) for machine learning and analytics is one of the important components in data science. However, due to privacy concerns, data should be enforced with strong privacy guarantees before sharing. Different privacy-preserving approaches were developed for privacy preserving data sharing; however, identifying the best privacy-preservation approach for the privacy-preservation of a certain dataset is still a challenge. Different parameters can influence the efficacy of the process, such as the characteristics of the input dataset, the strength of the privacy-preservation approach, and the expected level of utility of the resulting dataset (on the corresponding data mining application such as classification). This paper presents a framework named \underline{P}rivacy \underline{P}reservation \underline{a}s \underline{a} \underline{S}ervice (PPaaS) to reduce this complexity. The proposed method employs selective privacy preservation via data perturbation and looks at different dynamics that can influence the quality of the privacy preservation of a dataset. PPaaS includes pools of data perturbation methods, and for each application and the input dataset, PPaaS selects the most suitable data perturbation approach after rigorous evaluation. It enhances the usability of privacy-preserving methods within its pool; it is a generic platform that can be used to sanitize big data in a granular, application-specific manner by employing a suitable combination of diverse privacy-preserving algorithms to provide a proper balance between privacy and utility.
翻译:个人可识别信息( PII) 可以通过各种渠道找到进入网络空间的途径, 许多潜在来源可以泄露这些信息。 用于机器学习和分析的数据共享( 如跨机构数据共享) 是数据科学的重要组成部分之一 。 但是, 由于隐私问题, 在共享前, 数据应该以严格的隐私保障强制实施。 为隐私保护数据共享开发了不同的隐私保护方法; 然而, 确定保存某些数据集隐私的最佳隐私保护方法仍是一项挑战 。 不同的参数可以影响这一过程的效能, 例如输入数据集的特性、 隐私保护方法的强度、 隐私保护方法的力度以及由此产生的数据集的预期效用水平( 在相应的数据挖掘应用程序中, 如分类等 ) 。 本文提出了一个名为“ underline{P} 保存下线{ 保存\ underline{ a} 保存某些数据集的最佳隐私保护方法 { Sderline{ Serverview ( Pa) 。 拟议的方法使用选择性的隐私保存数据保存方式, 以及 使用各种动态数据应用之后的精确性, 将数据的精确性方法纳入数据的每个数据应用中。