We consider vertical logistic regression (VLR) trained with mini-batch gradient descent -- a setting which has attracted growing interest among industries and proven to be useful in a wide range of applications including finance and medical research. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks, where the protocols might differ between one another, yet a procedure of obtaining local gradients is implicitly shared. We first consider the honest-but-curious threat model, in which the detailed implementation of protocol is neglected and only the shared procedure is assumed, which we abstract as an oracle. We find that even under this general setting, single-dimension feature and label can still be recovered from the other party under suitable constraints of batch size, thus demonstrating the potential vulnerability of all frameworks following the same philosophy. Then we look into a popular instantiation of the protocol based on Homomorphic Encryption (HE). We propose an active attack that significantly weaken the constraints on batch size in the previous analysis via generating and compressing auxiliary ciphertext. To address the privacy leakage within the HE-based protocol, we develop a simple-yet-effective countermeasure based on Differential Privacy (DP), and provide both utility and privacy guarantees for the updated algorithm. Finally, we empirically verify the effectiveness of our attack and defense on benchmark datasets. Altogether, our findings suggest that all vertical federated learning frameworks that solely depend on HE might contain severe privacy risks, and DP, which has already demonstrated its power in horizontal federated learning, can also play a crucial role in the vertical setting, especially when coupled with HE or secure multi-party computation (MPC) techniques.
翻译:我们考虑的是经过小型梯度下降训练的垂直后勤回归(VLR) -- -- 这一环境吸引了各行业的兴趣,并证明在包括金融和医学研究在内的广泛应用中非常有用。我们在一个开放源代码的联邦学习框架对VLR进行全面和严格的隐私分析,在这种框架中,协议可能互不相同,但获取本地梯度的程序是暗中共享的。我们首先考虑的是诚实但充满恐惧的威胁模式,在这种模式中,协议的详细执行被忽视,而只有共同的程序被假定为一种标准。我们发现,即使在这种总体背景下,单分层特征和标签仍然可以从另一方回收,在批量的适当限制下,包括金融和医学研究研究。 然后,我们审视基于变形加密(HE)的协议的流行,我们提出积极打击,通过生成和压缩辅助的多层密码,大大削弱先前分析中对批量规模的限制。为了解决基于基于EH协议的隐私渗漏,我们开发了一个简单、独立的权势特征特征特征特征特征, 最终我们提出了一种基于安全、高基数据、高基级的系统、高基级的系统化的系统化数据。