With the ever-growing data and the need for developing powerful machine learning models, data owners increasingly depend on various untrusted platforms (e.g., public clouds, edges, and machine learning service providers) for scalable processing or collaborative learning. Thus, sensitive data and models are in danger of unauthorized access, misuse, and privacy compromises. A relatively new body of research confidentially trains machine learning models on protected data to address these concerns. In this survey, we summarize notable studies in this emerging area of research. With a unified framework, we highlight the critical challenges and innovations in outsourcing machine learning confidentially. We focus on the cryptographic approaches for confidential machine learning (CML), primarily on model training, while also covering other directions such as perturbation-based approaches and CML in the hardware-assisted computing environment. The discussion will take a holistic way to consider a rich context of the related threat models, security assumptions, design principles, and associated trade-offs amongst data utility, cost, and confidentiality.
翻译:随着数据不断增长,而且需要开发强大的机器学习模型,数据拥有者越来越依赖各种不受信任的平台(如公共云层、边缘和机器学习服务提供者)进行可扩展的处理或协作学习,因此,敏感数据和模型有未经授权的获取、滥用和隐私妥协的危险,这是一个相对较新的研究机构,对受保护数据方面的机器学习模型进行保密培训,以解决这些问题。在这次调查中,我们总结了这一新兴研究领域的显著研究。我们有一个统一的框架,强调外包机器学习方面的关键挑战和创新。我们侧重于保密机器学习的加密方法(CML),主要侧重于模式培训,同时涵盖其他方向,如在硬件辅助计算环境中以扰动为基础的方法和CML。讨论将采取全面的方式,考虑相关威胁模型、安全假设、设计原则以及数据效用、成本和保密性之间的相关权衡的丰富背景。