The performance of deep learning models highly depends on the amount of training data. It is common practice for today's data holders to merge their datasets and train models collaboratively, which yet poses a threat to data privacy. Different from existing methods such as secure multi-party computation (MPC) and federated learning (FL), we find representation learning has unique advantages in collaborative learning due to the lower communication overhead and task-independency. However, data representations face the threat of model inversion attacks. In this article, we formally define the collaborative learning scenario, and quantify data utility and privacy. Then we present ARS, a collaborative learning framework wherein users share representations of data to train models, and add imperceptible adversarial noise to data representations against reconstruction or attribute extraction attacks. By evaluating ARS in different contexts, we demonstrate that our mechanism is effective against model inversion attacks, and achieves a balance between privacy and utility. The ARS framework has wide applicability. First, ARS is valid for various data types, not limited to images. Second, data representations shared by users can be utilized in different tasks. Third, the framework can be easily extended to the vertical data partitioning scenario.
翻译:深层次学习模型的性能在很大程度上取决于培训数据的数量。当今数据持有者通常的做法是将其数据集合并并合作培训模型,这对数据隐私构成威胁。与安全多方计算(MPC)和联合学习(FL)等现有方法不同,我们发现,由于通信间接费用和任务依赖性较低,代表性学习在协作学习方面具有独特的优势。然而,数据表述面临模式反向攻击的威胁。在本条中,我们正式界定了合作学习情景,量化了数据效用和隐私。然后,我们介绍了ARS,这是一个合作学习框架,用户分享数据以培训模型,并在针对重建或引力攻击的数据表述中增加了无法察觉的对抗性噪音。通过在不同背景下评估ARS,我们证明我们的机制对于模式反向攻击有效,并在隐私和实用性之间实现平衡。ARS框架具有广泛适用性。首先,ARS对各种数据类型有效,不限于图像。第二,用户共享的数据表述可以在不同任务中使用。第三,框架可以很容易扩展至纵向数据分割设想方案。