With the goal of generalizing to out-of-distribution (OOD) data, recent domain generalization methods aim to learn "stable" feature representations whose effect on the output remains invariant across domains. Given the theoretical connection between generalization and privacy, we ask whether better OOD generalization leads to better privacy for machine learning models, where privacy is measured through robustness to membership inference (MI) attacks. In general, we find that the relationship does not hold. Through extensive evaluation on a synthetic dataset and image datasets like MNIST, Fashion-MNIST, and Chest X-rays, we show that a lower OOD generalization gap does not imply better robustness to MI attacks. Instead, privacy benefits are based on the extent to which a model captures the stable features. A model that captures stable features is more robust to MI attacks than models that exhibit better OOD generalization but do not learn stable features. Further, for the same provable differential privacy guarantees, a model that learns stable features provides higher utility as compared to others. Our results offer the first extensive empirical study connecting stable features and privacy, and also have a takeaway for the domain generalization community; MI attack can be used as a complementary metric to measure model quality.
翻译:以推广分配数据为目标,近期的域通用方法旨在学习“稳定”特征表现,这些特征对产出的影响在各领域之间始终没有变化。鉴于一般和隐私之间的理论联系,我们问OOOD一般化是否导致机器学习模型的更好隐私,在这种模式中,隐私是通过会员推论(MI)攻击的稳健度来衡量的。一般来说,我们发现这种关系并不会维持。通过广泛评价合成数据集和图像数据集,如MNIST、时装-MNIST和Chest X射线,我们发现较低的OOOD一般化差距并不意味着更好地抵御MI攻击。相反,隐私惠益是基于模型捕捉稳定特征的程度。一个反映MI攻击稳定特征的模型比显示OOD一般化较好但并不学习稳定特征的模型更牢固。此外,一个学习稳定特征的模型比其他模型更具有更高的效用。我们的成果提供了将稳定特征和隐私特征连接到MI攻击的第一次广泛的实证研究,作为一般攻击的模型,并且用来作为通用的模型。