Empirical attacks on collaborative learning show that the gradients of deep neural networks can not only disclose private latent attributes of the training data but also be used to reconstruct the original data. While prior works tried to quantify the privacy risk stemming from gradients, these measures do not establish a theoretically grounded understanding of gradient leakages, do not generalize across attackers, and can fail to fully explain what is observed through empirical attacks in practice. In this paper, we introduce theoretically-motivated measures to quantify information leakages in both attack-dependent and attack-independent manners. Specifically, we present an adaptation of the $\mathcal{V}$-information, which generalizes the empirical attack success rate and allows quantifying the amount of information that can leak from any chosen family of attack models. We then propose attack-independent measures, that only require the shared gradients, for quantifying both original and latent information leakages. Our empirical results, on six datasets and four popular models, reveal that gradients of the first layers contain the highest amount of original information, while the (fully-connected) classifier layers placed after the (convolutional) feature extractor layers contain the highest latent information. Further, we show how techniques such as gradient aggregation during training can mitigate information leakages. Our work paves the way for better defenses such as layer-based protection or strong aggregation.
翻译:合作学习的经验攻击表明,深神经网络的梯度不仅可以披露培训数据的私人潜在潜在属性,还可以用于重建原始数据。虽然先前的工程试图量化梯度产生的隐私风险,但这些措施并没有确立对梯度渗漏的理论上的理解,没有在攻击者之间普及,无法充分解释在实践中通过经验攻击观察到的情况。在本文中,我们引入了具有理论动机的措施,以量化以攻击为依存和攻击为依存的方式的信息渗漏。具体地说,我们介绍了对美元(千差万别)信息所作的调整,该调整概括了经验攻击成功率,并允许量化从任何选定的攻击模式中泄漏的信息数量。我们随后提出了攻击独立措施,仅要求使用共同的梯度,以量化原始和潜在信息渗漏。我们在六个数据集和四个流行模型上的经验结果显示,最初层的梯度包含最高原始信息量,而(千里连系的)分类层则概括了经验性攻击成功率,并允许量化从任何选定的攻击模式中泄漏的信息数量。我们随后提出的攻击性措施仅要求使用共同的梯度,仅用共同的梯度来量化原始和潜在的梯度,从而显示我们进行更深层的深度的深度保护。