This paper explores the use of affine hulls of points as a means of representing data via learning in Reproducing Kernel Hilbert Spaces (RKHS), with the goal of partitioning the data space into geometric bodies that conceal privacy-sensitive information about individual data points, while preserving the structure of the original learning problem. To this end, we introduce the Kernel Affine Hull Machine (KAHM), which provides an effective way of computing a distance measure from the resulting bounded geometric body. KAHM is a critical building block in wide and deep autoencoders, which enable data representation learning for classification applications. To ensure privacy-preserving learning, we propose a novel method for generating fabricated data, which involves smoothing differentially private data samples through a transformation process. The resulting fabricated data guarantees not only differential privacy but also ensures that the KAHM modeling error is not larger than that of the original training data samples. We also address the accuracy-loss issue that arises with differentially private classifiers by using fabricated data. This approach results in a significant reduction in the risk of membership inference attacks while incurring only a marginal loss of accuracy. As an application, a KAHM based differentially private federated learning scheme is introduced featuring that the evaluation of global classifier requires only locally computed distance measures. Overall, our findings demonstrate the potential of KAHM as effective tool for privacy-preserving learning and classification.
翻译:本文探讨了利用点的仿射包作为表示数据的方式,并结合在再生核希尔伯特空间(RKHS)中的学习,以在保留原始学习问题结构的同时将数据空间划分为几何体,以隐藏有关个体数据点的隐私信息。为此,我们介绍了基于核仿射包机(KAHM),它提供了一种从生成的有界几何体中计算距离测量的有效方法。KAHM是广泛和深层自动编码器中的重要构建块,为分类应用的数据表示学习提供了支持。为确保隐私保护学习,我们提出了一种生成假数据的新方法,通过一种转化过程,对差分隐私数据样本进行平滑处理。生成的数据不仅保证差分隐私,还确保KAHM模型误差不大于训练数据样本的误差。我们还通过使用假数据来解决差分隐私分类器出现的准确性损失问题。这种方法减小了成员推理攻击的风险,并且只产生了微小的准确性损失。作为一个应用,我们介绍了一种基于KAHM的差分隐私联邦学习方案,其中全局分类器的评估仅需要局部计算距离测量。总的来说,我们的研究结果证明了KAHM作为隐私保护学习和分类的有效工具的潜力。