Applying machine learning (ML) to sensitive domains requires privacy protection of the underlying training data through formal privacy frameworks, such as differential privacy (DP). Yet, usually, the privacy of the training data comes at the costs of the resulting ML models' utility. One reason for this is that DP uses one homogeneous privacy budget epsilon for all training data points, which has to align with the strictest privacy requirement encountered among all data holders. In practice, different data holders might have different privacy requirements and data points of data holders with lower requirements could potentially contribute more information to the training process of the ML models. To account for this possibility, we propose three novel methods that extend the DP framework Private Aggregation of Teacher Ensembles (PATE) to support training an ML model with different personalized privacy guarantees within the training data. We formally describe the methods, provide theoretical analyses of their privacy bounds, and experimentally evaluate their effect on the final model's utility at the example of the MNIST and Adult income datasets. Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline. Thereby, our methods can improve the privacy-utility trade-off in scenarios in which different data holders consent to contribute their sensitive data at different privacy levels.
翻译:在敏感领域应用机器学习(ML)要求通过正式的隐私框架(如差异隐私(DP))对基本培训数据进行隐私保护。然而,通常,培训数据的隐私是以由此产生的ML模型的效用为代价的。原因之一是,DP对所有培训数据点都使用一个单一的隐私预算百塞隆,这必须与所有数据持有者遇到的最严格的隐私要求相一致。在实践中,不同数据持有者可能具有不同的隐私要求,而需求较低的数据持有者的数据点可能会为ML模型的培训进程提供更多信息。为了考虑这一可能性,我们提出了三种新方法,扩大DP框架的隐私,以扩大师范教师专用聚合(PATE),以支持培训具有培训数据中不同个性化隐私保障的ML模型。我们正式描述这些方法,提供对其隐私界限的理论分析,并在MNIST和成人收入数据集的示例中实验性地评估其对最后模式效用的影响。我们的实验表明,我们的个人化隐私方法产生比非个性化基线更高的准确模型。为此,我们提出了三种新方法,即扩大师团(PATE)的私人保密性数据持有者可以在不同的保密假设中改进不同的隐私数据。