With ever-growing data and the need for developing powerful machine learning models, data owners increasingly depend on untrusted platforms (e.g., public clouds, edges, and machine learning service providers). However, sensitive data and models become susceptible to unauthorized access, misuse, and privacy compromises. Recently, a body of research has been developed to train machine learning models on encrypted outsourced data with untrusted platforms. In this survey, we summarize the studies in this emerging area with a unified framework to highlight the major challenges and approaches. We will focus on the cryptographic approaches for confidential machine learning (CML), while also covering other directions such as perturbation-based approaches and CML in the hardware-assisted confidential computing environment. The discussion will take a holistic way to consider a rich context of the related threat models, security assumptions, attacks, design philosophies, and associated trade-offs amongst data utility, cost, and confidentiality.
翻译:随着数据不断增长,而且需要开发强大的机器学习模型,数据拥有者越来越依赖不受信任的平台(如公共云层、边缘和机器学习服务供应商),然而,敏感数据和模型很容易被未经授权的获取、滥用和隐私妥协;最近,开发了一套研究,用未经信任的平台对加密的外包数据进行机器学习模型培训;在本次调查中,我们总结了这个新兴领域的研究,制定了统一框架,以突出主要的挑战和方法;我们将侧重于保密机器学习的加密方法(CML),同时涵盖其他方向,如硬件辅助保密计算环境中的渗透法和CML;讨论将采取整体方式,考虑相关的威胁模型、安全假设、攻击、设计哲学以及数据效用、成本和保密之间的相关权衡。