从日志损失计分的 Label 推断攻击 (Label Inference Attacks from Log-loss Scores)

Log-loss (also known as cross-entropy loss) metric is ubiquitously used across machine learning applications to assess the performance of classification algorithms. In this paper, we investigate the problem of inferring the labels of a dataset from single (or multiple) log-loss score(s), without any other access to the dataset. Surprisingly, we show that for any finite number of label classes, it is possible to accurately infer the labels of the dataset from the reported log-loss score of a single carefully constructed prediction vector if we allow arbitrary precision arithmetic. Additionally, we present label inference algorithms (attacks) that succeed even under addition of noise to the log-loss scores and under limited precision arithmetic. All our algorithms rely on ideas from number theory and combinatorics and require no model training. We run experimental simulations on some real datasets to demonstrate the ease of running these attacks in practice.

翻译：日志损失(又称交叉作物损失)的衡量标准被普遍地用于跨机器学习应用,以评估分类算法的性能。在本文中,我们调查从单项(或多个)日志损失得分中推算数据集标签的问题,而没有其它查阅数据集的途径。令人惊讶的是,我们显示,对于任何数量有限的标签类别,如果允许任意精确计算,则可以精确地从所报告的单项审慎构建的预测矢量的日志损失得分中准确推算数据集的标签。此外,我们提出了标签推论算算法(攻击),甚至在日志损失得分和有限精确算术之外也成功。我们的所有算法都依赖于数字理论和组合法中的想法,不需要模型培训。我们在一些真实的数据集上进行实验性模拟,以证明在实际操作中进行这些攻击的易度。