Federated learning (FL) is the most popular distributed machine learning technique. FL allows machine-learning models to be trained without acquiring raw data to a single point for processing. Instead, local models are trained with local data; the models are then shared and combined. This approach preserves data privacy as locally trained models are shared instead of the raw data themselves. Broadly, FL can be divided into horizontal federated learning (HFL) and vertical federated learning (VFL). For the former, different parties hold different samples over the same set of features; for the latter, different parties hold different feature data belonging to the same set of samples. In a number of practical scenarios, VFL is more relevant than HFL as different companies (e.g., bank and retailer) hold different features (e.g., credit history and shopping history) for the same set of customers. Although VFL is an emerging area of research, it is not well-established compared to HFL. Besides, VFL-related studies are dispersed, and their connections are not intuitive. Thus, this survey aims to bring these VFL-related studies to one place. Firstly, we classify existing VFL structures and algorithms. Secondly, we present the threats from security and privacy perspectives to VFL. Thirdly, for the benefit of future researchers, we discussed the challenges and prospects of VFL in detail.
翻译:联邦学习(FL)是最流行的分布式机器学习技术。FL允许在不获取原始数据的情况下对机器学习模式进行培训,不将原始数据提高到单一处理点。相反,当地模型得到当地数据的培训;然后分享和合并模型。这种方法保留了数据隐私,因为当地培训的模式是共享的,而不是原始数据本身。FL可以分为横向联合学习(HFL)和纵向联合学习(VFL),对于前者来说,不同当事方拥有相同的成套特征的不同样本;对于后者来说,不同当事方拥有属于同一一组样本的不同特征数据。因此,在一些实际假设中,VFLL比HFL更具相关性,因为不同的公司(例如银行和零售商)拥有不同的特征(例如信用历史和购物历史),对同一组客户具有不同的特征。虽然FLFL是一个新出现的研究领域,但与HLF相比,它并不牢固确立。此外,与VFL有关的研究是分散的,而且它们的关联性数据也并不直观。因此,这次调查的目的是将这些与VFLFL相关的研究、我们目前讨论的保密前景和FLFLFA中的现有前景结构、我们目前讨论的V和将来的前景分析的V。