Big data has remarkably evolved over the last few years to realize an enormous volume of data generated from newly emerging services and applications and a massive number of Internet-of-Things (IoT) devices. The potential of big data can be realized via analytic and learning techniques, in which the data from various sources is transferred to a central cloud for central storage, processing, and training. However, this conventional approach faces critical issues in terms of data privacy as the data may include sensitive data such as personal information, governments, banking accounts. To overcome this challenge, federated learning (FL) appeared to be a promising learning technique. However, a gap exists in the literature that a comprehensive survey on FL for big data services and applications is yet to be conducted. In this article, we present a survey on the use of FL for big data services and applications, aiming to provide general readers with an overview of FL, big data, and the motivations behind the use of FL for big data. In particular, we extensively review the use of FL for key big data services, including big data acquisition, big data storage, big data analytics, and big data privacy preservation. Subsequently, we review the potential of FL for big data applications, such as smart city, smart healthcare, smart transportation, smart grid, and social media. Further, we summarize a number of important projects on FL-big data and discuss key challenges of this interesting topic along with several promising solutions and directions.
翻译:过去几年来,大数据有了显著的发展,以便实现从新兴服务和应用以及大量互联网电话装置中产生的大量数据。大数据的潜力可以通过分析和学习技术实现,其中各种来源的数据被转移到中央云层,用于中央储存、处理和培训。然而,这种传统方法在数据隐私方面面临着关键问题,因为数据可能包括个人信息、政府、银行账户等敏感数据。为了克服这一挑战,联合学习(FL)似乎是一种很有希望的学习技术。然而,文献中存在的差距是,尚未对大数据服务和应用的FL进行全面调查。在本篇文章中,我们介绍了使用FL数据用于大数据服务和应用的中央云层,目的是向一般读者提供FL、大数据以及使用FL数据背后的动机的概览。我们广泛审查了关键大数据服务(包括大数据获取、大数据存储、大数据存储、重要数据、重要信息、重要媒体应用的FL,以及智能信息数据库的智能数据传输)的利用情况。随后,我们进一步审视了大数据获取、重要数据、重要信息流、重要信息流、重要信息流、重要数据流的智能城市数据传输等重要应用领域,我们进一步审视。