We propose a framework in which multiple entities collaborate to build a machine learning model while preserving privacy of their data. The approach utilizes feature embeddings from shared/per-entity feature extractors transforming data into a feature space for cooperation between entities. We propose two specific methods and compare them with a baseline method. In Shared Feature Extractor (SFE) Learning, the entities use a shared feature extractor to compute feature embeddings of samples. In Locally Trained Feature Extractor (LTFE) Learning, each entity uses a separate feature extractor and models are trained using concatenated features from all entities. As a baseline, in Cooperatively Trained Feature Extractor (CTFE) Learning, the entities train models by sharing raw data. Secure multi-party algorithms are utilized to train models without revealing data or features in plain text. We investigate the trade-offs among SFE, LTFE, and CTFE in regard to performance, privacy leakage (using an off-the-shelf membership inference attack), and computational cost. LTFE provides the most privacy, followed by SFE, and then CTFE. Computational cost is lowest for SFE and the relative speed of CTFE and LTFE depends on network architecture. CTFE and LTFE provide the best accuracy. We use MNIST, a synthetic dataset, and a credit card fraud detection dataset for evaluations.
翻译:我们提议一个框架,让多个实体合作建立机器学习模式,同时保护其数据的隐私。该方法利用来自共享/实体特征提取器的嵌入功能,将数据转化为实体之间合作的特色空间。我们提出两个具体方法,并将它们与基线方法进行比较。在共同地物提取器(SFE)学习中,这些实体使用共同地物提取器来计算样品的特征嵌入功能。在本地培训地物提取器(LTFE)学习中,每个实体使用单独的地物提取器和模型,利用所有实体的分类特性进行培训。作为基准,在合作性地物提取器(CTFE)学习中,各实体通过共享原始数据来培训模型。在不披露数据或纯文本特征的情况下,使用安全的多党算法来培训模型。我们调查SFE、LTFE和CTFE之间在业绩、隐私渗漏(使用离地成员攻击)和计算成本方面,LTFE提供最隐私,随后是SFE、CTFFFS和S的相对速度评估,我们CFFA和CFIFA数据库的计算成本最低。