利用实际垂直联邦学习记录相似性 (Exploiting Record Similarity for Practical Vertical Federated Learning)

As the privacy of machine learning has drawn increasing attention, federated learning is introduced to enable collaborative learning without revealing raw data. Notably, \textit{vertical federated learning} (VFL), where parties share the same set of samples but only hold partial features, has a wide range of real-world applications. However, existing studies in VFL rarely study the ``record linkage'' process. They either design algorithms assuming the data from different parties have been linked or use simple linkage methods like exact-linkage or top1-linkage. These approaches are unsuitable for many applications, such as the GPS location and noisy titles requiring fuzzy matching. In this paper, we design a novel similarity-based VFL framework, FedSim, which is suitable for more real-world applications and achieves higher performance on traditional VFL tasks. Moreover, we theoretically analyze the privacy risk caused by sharing similarities. Our experiments on three synthetic datasets and five real-world datasets with various similarity metrics show that FedSim consistently outperforms other state-of-the-art baselines.

翻译：随着机器学习的隐私日益引起注意,联谊学习被引入,以便在不透露原始数据的情况下进行协作学习。值得注意的是,当各方共用相同样本但只具有部分特征时,联盟学习(VFL)具有广泛的现实应用。然而,VFL的现有研究很少研究“记录链接”过程。它们要么假设来自不同当事方的数据是相互联系的,要么使用精确链接或顶层链接等简单联系方法。这些方法不适合许多应用,例如全球定位系统位置和需要模糊匹配的吵闹标题。在本文件中,我们设计了一个基于类似功能的VFLL框架,即FedSim,它适合更真实的应用,在传统的VFLF任务上取得更高的业绩。此外,我们从理论上分析了由于共享相似性而产生的隐私风险。我们在三个合成数据集和五个真实世界数据集上进行的实验显示,FedSim一直超越其他最先进的基线。

相关内容

联邦学习

关注 0

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

专知会员服务

24+阅读 · 2021年1月13日