Edge computing and distributed machine learning have advanced to a level that can revolutionize a particular organization. Distributed devices such as the Internet of Things (IoT) often produce a large amount of data, eventually resulting in big data that can be vital in uncovering hidden patterns, and other insights in numerous fields such as healthcare, banking, and policing. Data related to areas such as healthcare and banking can contain potentially sensitive data that can become public if they are not appropriately sanitized. Federated learning (FedML) is a recently developed distributed machine learning (DML) approach that tries to preserve privacy by bringing the learning of an ML model to data owners'. However, literature shows different attack methods such as membership inference that exploit the vulnerabilities of ML models as well as the coordinating servers to retrieve private data. Hence, FedML needs additional measures to guarantee data privacy. Furthermore, big data often requires more resources than available in a standard computer. This paper addresses these issues by proposing a distributed perturbation algorithm named as DISTPAB, for privacy preservation of horizontally partitioned data. DISTPAB alleviates computational bottlenecks by distributing the task of privacy preservation utilizing the asymmetry of resources of a distributed environment, which can have resource-constrained devices as well as high-performance computers. Experiments show that DISTPAB provides high accuracy, high efficiency, high scalability, and high attack resistance. Further experiments on privacy-preserving FedML show that DISTPAB is an excellent solution to stop privacy leaks in DML while preserving high data utility.
翻译:远程计算和分布式机器学习已发展到可以使某个组织发生革命性的地步。诸如物的互联网(IoT)等分布式机器学习等设备往往产生大量数据,最终导致在发现隐藏模式方面至关重要的海量数据,以及在保健、银行和警务等许多领域的其他洞察力。与保健和银行等领域有关的数据可能包含敏感数据,如果数据不适当保持清洁,这些数据可能公开;联邦学习(FedML)是一种最近开发的分布式机器学习(DML)方法,试图通过将ML模型的学习带给数据所有者来保护隐私。然而,文献显示不同的攻击方法,例如会籍推断利用ML模型的脆弱性以及协调服务器检索私人数据。因此,FedML需要更多措施来保障数据隐私。此外,大数据往往需要比标准计算机中现有的更多资源。本文件通过建议一种名为DISTPABABAB的更分散式的透视线算法来保护横向隔断数据的隐私。DISTPAB通过分配高保密性保密性、高性保密性精确性工具的计算瓶颈瓶颈,同时将高性数据存储高性存储高性、高性数据库显示高性数据效率。