With the advancement of machine learning (ML) and its growing awareness, many organizations who own data but not ML expertise (data owner) would like to pool their data and collaborate with those who have expertise but need data from diverse sources to train truly generalizable models (model owner). In such collaborative ML, the data owner wants to protect the privacy of its training data, while the model owner desires the confidentiality of the model and the training method which may contain intellectual properties. However, existing private ML solutions, such as federated learning and split learning, cannot meet the privacy requirements of both data and model owners at the same time. This paper presents Citadel, a scalable collaborative ML system that protects the privacy of both data owner and model owner in untrusted infrastructures with the help of Intel SGX. Citadel performs distributed training across multiple training enclaves running on behalf of data owners and an aggregator enclave on behalf of the model owner. Citadel further establishes a strong information barrier between these enclaves by means of zero-sum masking and hierarchical aggregation to prevent data/model leakage during collaborative training. Compared with the existing SGX-protected training systems, Citadel enables better scalability and stronger privacy guarantees for collaborative ML. Cloud deployment with various ML models shows that Citadel scales to a large number of enclaves with less than 1.73X slowdown caused by SGX.
翻译:随着机器学习(ML)的推进及其认识的提高,拥有数据但非ML专门知识(数据所有人)的许多组织希望与拥有专门知识但需要不同来源的数据的组织(模范所有人)汇集数据,并与那些拥有专门知识但需要不同来源的数据的组织(模范所有人)合作,以培训真正普遍适用的模式(模范所有人),在这种协作的ML中,数据所有人希望保护其培训数据的隐私,而模范所有人希望模型和可能包含知识产权的培训方法的保密性;然而,现有的私人ML解决方案,如联结学习和分解学习,无法同时满足数据和模范所有人(数据所有人)的隐私要求。本文介绍Citadel,这是一个可扩展的协作ML系统,在Intel SGX的帮助下,保护数据所有人和模型所有者的隐私。 Citadel-L在多个培训飞地上进行分发培训,代表模范主和集速飞地。Cita-X进一步通过零和分级封封方式在合作培训期间防止数据/模范泄漏。比SGX更牢固的CL培训规模更强的C-Cdel-C-Clavicultal使现有的S-L能够使现有的Slodal-Clavicultraviculticultation系统更小地成为大型的大规模,使C-C-C-C-S-C-C-Slavicultraviced 使大型培训系统成为大型的大型的大型的大型系统。