Motivated by the advancing computational capacity of distributed end-user equipments (UEs), as well as the increasing concerns about sharing private data, there has been considerable recent interest in machine learning (ML) and artificial intelligence (AI) that can be processed on on distributed UEs. Specifically, in this paradigm, parts of an ML process are outsourced to multiple distributed UEs, and then the processed ML information is aggregated on a certain level at a central server, which turns a centralized ML process into a distributed one, and brings about significant benefits. However, this new distributed ML paradigm raises new risks of privacy and security issues. In this paper, we provide a survey of the emerging security and privacy risks of distributed ML from a unique perspective of information exchange levels, which are defined according to the key steps of an ML process, i.e.: i) the level of preprocessed data, ii) the level of learning models, iii) the level of extracted knowledge and, iv) the level of intermediate results. We explore and analyze the potential of threats for each information exchange level based on an overview of the current state-of-the-art attack mechanisms, and then discuss the possible defense methods against such threats. Finally, we complete the survey by providing an outlook on the challenges and possible directions for future research in this critical area.
翻译:由于分布式终端用户设备的计算能力不断提高,而且对分享私人数据的关切日益严重,因此最近对机器学习和人工智能(AI)的兴趣很大,可以在分布式终端设备上处理。具体来说,在这一模式中,ML流程的某些部分外包给多个分布式终端设备,然后将经过处理的ML信息在一个中央服务器的某个级别上汇总,将集中式ML流程转化为分布式终端设备,并带来重大收益。然而,这一新的分布式ML模式提出了隐私和安全问题的新风险。在本文件中,我们从信息交流的独特角度,对分布式计算机学习和人工智能(AI)的新出现的安全和隐私风险进行了调查,这些风险是根据ML流程的关键步骤界定的,即:一) 预处理的数据水平,二) 学习模式的水平,三) 提取式的中央ML流程水平,四) 中期结果的水平。我们根据对当前状态、未来威胁调查可能采取的防御方法的概览,探讨每个信息交流层次的威胁的可能性,并分析我们随后根据对最终威胁和今后可能采取的防御方法进行讨论。