参数或隐私:可证实的多参数化与成员推论之间的权衡取舍 (Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference)

A surprising phenomenon in modern machine learning is the ability of a highly overparameterized model to generalize well (small error on the test data) even when it is trained to memorize the training data (zero error on the training data). This has led to an arms race towards increasingly overparameterized models (c.f., deep learning). In this paper, we study an underexplored hidden cost of overparameterization: the fact that overparameterized models are more vulnerable to privacy attacks, in particular the membership inference attack that predicts the (potentially sensitive) examples used to train a model. We significantly extend the relatively few empirical results on this problem by theoretically proving for an overparameterized linear regression model with Gaussian data that the membership inference vulnerability increases with the number of parameters. Moreover, a range of empirical studies indicates that more complex, nonlinear models exhibit the same behavior. Finally, we study different methods for mitigating such attacks in the overparameterized regime, such as noise addition and regularization, and conclude that simply reducing the parameters of an overparameterized model is an effective strategy to protect it from membership inference without greatly decreasing its generalization error.

翻译：在现代机器学习中,一个令人惊讶的现象是,一个高度超度的模型能够广泛推广(测试数据上的小错误),即使它受过训练可以对培训数据进行记忆化(培训数据上零差),但这种模型却能够使培训数据(培训数据上零差)具有广泛性。这导致了军备竞赛,使模型的模型变得日益超度化(c.f.,深层学习)。在本文中,我们研究了一个未得到充分探讨的超度化隐蔽成本:过度分度模型更容易受到隐私攻击,特别是会籍推论攻击,预测了用于培训模型的(潜在敏感)实例。我们大大扩展了这个问题相对较少的经验性结果。我们从理论上证明一个超度分度的线性回归模型,而Gausian数据表明,成员推断脆弱性随参数数的增加而增加。此外,一系列经验研究表明,更复杂、非线性模型表现出同样的行为。最后,我们研究了在过分精确化制度中减轻这类攻击的不同方法,例如噪音增加和正规化,我们的结论是,仅仅减少过度分化模型的参数是保护其成员不受普遍误差的有效战略。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

不可错过！UIUC最新《对抗机器学习》课程，附PPT

专知会员服务

35+阅读 · 2020年12月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日