Machine learning classifiers rely on loss functions for performance evaluation, often on a private (hidden) dataset. Label inference was recently introduced as the problem of reconstructing the ground truth labels of this private dataset from just the (possibly perturbed) loss function values evaluated at chosen prediction vectors, without any other access to the hidden dataset. Existing results have demonstrated this inference is possible on specific loss functions like the cross-entropy loss. In this paper, we introduce the notion of codomain separability to formally study the necessary and sufficient conditions under which label inference is possible from any (noisy) loss function values. Using this notion, we show that for many commonly used loss functions, including multiclass cross-entropy with common activation functions and some Bregman divergence-based losses, it is possible to design label inference attacks for arbitrary noise levels. We demonstrate that these attacks can also be carried out through actual neural network models, and argue, both formally and empirically, the role of finite precision arithmetic in this setting.
翻译:机器学习分类员依靠损失功能进行绩效评估,通常依靠私人(隐蔽)数据集。 Label 推论最近被引入,因为将这一私人数据集的地面真实性标签从选择的预测矢量评估的损失函数值(可能受扰动)重塑为地面真实性标签的问题,而没有其它途径访问隐藏的数据集。现有结果表明,这种推论有可能针对特定的损失函数,如跨热带损失。在本文中,我们引入了共生分离性概念,以便正式研究从任何(隐蔽的)损失函数值中推导出标签的必要和充分条件。我们利用这个概念表明,对于许多常用的损失函数,包括具有共同激活功能的多级交叉性以及一些布雷格曼差异性损失,可以设计任意噪音等级的推论攻击。我们证明,这些攻击也可以通过实际的神经网络模型进行,并正式和实证地说明在这个环境中限定精确性计算的作用。