New methods designed to preserve data privacy require careful scrutiny. Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a ``privacy-preserving'' method is attacked. A recent work selected for an Outstanding Paper Award at ICML 2022 (Dong et al., 2022) claims that dataset condensation (DC) significantly improves data privacy when training machine learning models. This claim is supported by theoretical analysis of a specific dataset condensation technique and an empirical evaluation of resistance to some existing membership inference attacks. In this note we examine the claims in the work of Dong et al. (2022) and describe major flaws in the empirical evaluation of the method and its theoretical analysis. These flaws imply that their work does not provide statistically significant evidence that DC improves the privacy of training ML models over a naive baseline. Moreover, previously published results show that DP-SGD, the standard approach to privacy preserving ML, simultaneously gives better accuracy and achieves a (provably) lower membership attack success rate.
翻译:最近为ICML 2022(Dong et al., 2022)杰出论文奖挑选的一项工作声称,当培训机器学习模式时,数据集浓缩(DC)可大大改善数据隐私。这项主张得到对特定数据集浓缩技术的理论分析以及对某些现有会员推论攻击的抵制性经验评估的支持。在本说明中,我们审查了东等人(2022)工作中的主张,并描述了该方法的经验性评估及其理论分析中的重大缺陷。这些缺陷表明,它们的工作没有提供具有统计意义的证据表明,DC在天真基线上改进了培训ML模型的隐私。此外,先前公布的结果显示,DP-SGD是保护隐私的标准方法,它既能保护ML,又能提高(可明显)降低会员攻击率。