Privacy attacks on Machine Learning (ML) models often focus on inferring the existence of particular data points in the training data. However, what the adversary really wants to know is if a particular \emph{individual}'s (\emph{subject}'s) data was included during training. In such scenarios, the adversary is more likely to have access to the distribution of a particular subject, than actual records. Furthermore, in settings like cross-silo Federated Learning (FL), a subject's data can be embodied by multiple data records that are spread across multiple organizations. Nearly all of the existing private FL literature is dedicated to studying privacy at two granularities -- item-level (individual data records), and user-level (participating user in the federation), neither of which apply to data subjects in cross-silo FL. This insight motivates us to shift our attention from the privacy of data records to the privacy of \emph{data subjects}, also known as subject-level privacy. We propose two black-box attacks for \emph{subject membership inference}, of which one assumes access to a model after each training round. Using these attacks, we estimate subject membership inference risk on real-world data for single-party models as well as FL scenarios. We find our attacks to be extremely potent, even without access to exact training records, and using the knowledge of membership for a handful of subjects. To better understand the various factors that may influence subject privacy risk in cross-silo FL settings, we systematically generate several hundred synthetic federation configurations, varying properties of the data, model design and training, and the federation itself. Finally, we investigate the effectiveness of Differential Privacy in mitigating this threat.
翻译:机器学习(ML)模型的隐私攻击往往侧重于推断培训数据中存在特定数据点。 但是,对手真正想要知道的是,在培训期间是否包含特定 emph{ 个人( emph{ subject}) 的数据。 在这样的情景中,对手更有可能获得特定主题的分布,而不是实际记录。 此外,在跨西洛联邦学习( FL) 等场合,一个对象的数据可以包含在多个组织分布的多个数据记录中。 几乎所有现有的私人FL文献都致力于研究两种颗粒( 项目级( 个人数据记录)) 和用户级( 联邦参与用户) 的隐私, 这两种数据在培训中都不适用于跨Sloilo FL的数据主题。 这种洞察力促使我们把注意力从数据记录的隐私转移到 emph{ data subility( FL) 的隐私, 也称为主题级隐私。 我们建议用两种黑箱攻击, 用于 \emph{ subel beference) fireference 。 在每次培训中, 将一个选择一个进入一个模型, 我们使用真实的虚拟数据库的进入一个模型,