Models can expose sensitive information about their training data. In an attribute inference attack, an adversary has partial knowledge of some training records and access to a model trained on those records, and infers the unknown values of a sensitive feature of those records. We study a fine-grained variant of attribute inference we call \emph{sensitive value inference}, where the adversary's goal is to identify with high confidence some records from a candidate set where the unknown attribute has a particular sensitive value. We explicitly compare attribute inference with data imputation that captures the training distribution statistics, under various assumptions about the training data available to the adversary. Our main conclusions are: (1) previous attribute inference methods do not reveal more about the training data from the model than can be inferred by an adversary without access to the trained model, but with the same knowledge of the underlying distribution as needed to train the attribute inference attack; (2) black-box attribute inference attacks rarely learn anything that cannot be learned without the model; but (3) white-box attacks, which we introduce and evaluate in the paper, can reliably identify some records with the sensitive value attribute that would not be predicted without having access to the model. Furthermore, we show that proposed defenses such as differentially private training and removing vulnerable records from training do not mitigate this privacy risk. The code for our experiments is available at \url{https://github.com/bargavj/EvaluatingDPML}.
翻译:模型可以暴露有关其培训数据的敏感信息。 在属性推断攻击中, 对手部分了解某些培训记录, 并有机会获取关于这些记录的培训数据, 并推断出这些记录中敏感特征的未知值。 我们研究一个精细的属性推断变方, 我们称之为\ emph{ 敏感值推断}, 对手的目标是以高度自信的方式识别一组候选人的一些记录, 其中未知属性具有特别敏感价值。 我们明确地将属性推断与根据关于可供对手使用的培训数据的各种假设收集培训分发统计数据的数据浸泡进行对比。 我们的主要结论是:(1) 先前的属性推断方法不会显示更多关于这些模型培训数据的未知值。 比对手可以推断的属性推断的细微变变变变变变变变变变变变变变变变变变变变变变变变, 但是, 我们所介绍和评价的白框攻击, 可以可靠地识别一些来自该模型的脆弱记录, 而没有使用这种隐变变变变变变变变变变码, 将显示我们所预测的国防风险。