Neural networks pose a privacy risk to training data due to their propensity to memorise and leak information. Focusing on image classification, we show that neural networks also unintentionally memorise unique features even when they occur only once in training data. An example of a unique feature is a person's name that is accidentally present on a training image. Assuming access to the inputs and outputs of a trained model, the domain of the training data, and knowledge of unique features, we develop a score estimating the model's sensitivity to a unique feature by comparing the KL divergences of the model's output distributions given modified out-of-distribution images. Our results suggest that unique features are memorised by multi-layer perceptrons and convolutional neural networks trained on benchmark datasets, such as MNIST, Fashion-MNIST and CIFAR-10. We find that strategies to prevent overfitting (e.g.\ early stopping, regularisation, batch normalisation) do not prevent memorisation of unique features. These results imply that neural networks pose a privacy risk to rarely occurring private information. These risks can be more pronounced in healthcare applications if patient information is present in the training data.
翻译:神经网络对培训数据构成隐私风险, 原因是它们倾向于回忆和泄漏信息。 我们以图像分类为重点, 显示神经网络也无意中回忆了独特的特征, 即使它们只在培训数据中出现过一次。 一个独特的特征的例子就是在培训图像中不小心出现一个人的名字。 假设能够获取经过培训的模式的投入和产出、 培训数据领域和独特特征的知识, 我们开发了一个分数, 通过比较模型在经过修改的传播外图像输出分布的 KL 差异来估计模型对一个独特特征的敏感度。 这些结果表明神经网络对隐私风险很少发生私人信息。 这些风险在医疗保健应用中更为明显, 如果对病人数据进行培训, 这些风险在医疗保健应用中会更加明显。