We study the privacy risks that are associated with training a neural network's weights with self-supervised learning algorithms. Through empirical evidence, we show that the fine-tuning stage, in which the network weights are updated with an informative and often private dataset, is vulnerable to privacy attacks. To address the vulnerabilities, we design a post-training privacy-protection algorithm that adds noise to the fine-tuned weights and propose a novel differential privacy mechanism that samples noise from the logistic distribution. Compared to the two conventional additive noise mechanisms, namely the Laplace and the Gaussian mechanisms, the proposed mechanism uses a bell-shaped distribution that resembles the distribution of the Gaussian mechanism, and it satisfies pure $\epsilon$-differential privacy similar to the Laplace mechanism. We apply membership inference attacks on both unprotected and protected models to quantify the trade-off between the models' privacy and performance. We show that the proposed protection algorithm can effectively reduce the attack accuracy to roughly 50\%-equivalent to random guessing-while maintaining a performance loss below 5\%.
翻译:我们研究与培训神经网络重量有关的隐私风险,用自我监督的学习算法来培训神经网络的重量。我们通过经验证据表明,在微调阶段,网络重量以信息化和往往是私人数据集更新,很容易受到隐私攻击。为了解决脆弱性问题,我们设计了培训后隐私保护算法,在微调重量中增加噪音,并提出一种新的差异性隐私机制,从后勤分配中抽取噪音。与两个常规添加噪音机制,即Laplace和Gaussian机制相比,拟议机制使用类似于高山机制分布的钟形分布,它满足了类似于拉比机制的纯$\epslon$差异性隐私。我们对无防护和保护的模式都进行了推断性攻击,以量化模型隐私与性能之间的交易。我们表明,拟议的保护算法可以有效地将攻击准确性降低到约50 ⁇ -等值,同时将性能损失维持在5 ⁇ 以下。