We propose a novel theoretical and methodological framework for Gaussian process regression subject to privacy constraints. The proposed method can be used when a data owner is unwilling to share a high-fidelity supervised learning model built from their data with the public due to privacy concerns. The key idea of the proposed method is to add synthetic noise to the data until the predictive variance of the Gaussian process model reaches a prespecified privacy level. The optimal covariance matrix of the synthetic noise is formulated in terms of semi-definite programming. We also introduce the formulation of privacy-aware solutions under continuous privacy constraints using kernel-based approaches, and study their theoretical properties. The proposed method is illustrated by considering a model that tracks the trajectories of satellites and a real application on a census dataset.
翻译:本文提出了一种新颖的理论与方法框架,用于在隐私约束条件下进行高斯过程回归。当数据拥有者出于隐私考虑不愿向公众分享基于其数据构建的高精度监督学习模型时,本方法具有适用价值。该方法的核心思想是向数据中添加合成噪声,直至高斯过程模型的预测方差达到预设的隐私保护水平。合成噪声的最优协方差矩阵通过半定规划进行构建。此外,本文还引入了基于核方法的连续隐私约束下隐私感知解的构建形式,并研究了其理论性质。通过卫星轨迹追踪模型和人口普查数据集的实际应用案例,对所提方法进行了具体说明。