私人跨西罗跨联邦学习,用于提取疫苗不良活动 (Private Cross-Silo Federated Learning for Extracting Vaccine Adverse Event Mentions)

Federated Learning (FL) is quickly becoming a goto distributed training paradigm for users to jointly train a global model without physically sharing their data. Users can indirectly contribute to, and directly benefit from a much larger aggregate data corpus used to train the global model. However, literature on successful application of FL in real-world problem settings is somewhat sparse. In this paper, we describe our experience applying a FL based solution to the Named Entity Recognition (NER) task for an adverse event detection application in the context of mass scale vaccination programs. We present a comprehensive empirical analysis of various dimensions of benefits gained with FL based training. Furthermore, we investigate effects of tighter Differential Privacy (DP) constraints in highly sensitive settings where federation users must enforce Local DP to ensure strict privacy guarantees. We show that local DP can severely cripple the global model's prediction accuracy, thus dis-incentivizing users from participating in the federation. In response, we demonstrate how recent innovation on personalization methods can help significantly recover the lost accuracy. We focus our analysis on the Federated Fine-Tuning algorithm, FedFT, and prove that it is not PAC Identifiable, thus making it even more attractive for FL-based training.

翻译：联邦学习联合会(FL)正在迅速成为一个分布式培训模式,让用户在不实际分享其数据的情况下联合培训一个全球模型。用户可以间接地促进并直接受益于用于培训全球模型的更大规模综合数据资料库。然而,有关在现实世界问题环境下成功应用FL的文献有些少。我们在本文件中描述了我们为大规模疫苗接种方案背景下的不利事件检测应用应用FL实体识别(NER)任务应用FL解决方案的经验。我们介绍了对FL培训所获取的惠益的方方面面的全面经验分析。此外,我们调查了联邦用户必须强制实施本地DP以确保严格的隐私保障的高度敏感环境中的更严格差异隐私限制的影响。我们表明,本地DP可以严重削弱全球模型的预测准确性,从而不鼓励用户参加联邦。作为回应,我们展示了个人化方法的最新创新如何帮助大大恢复损失的准确性。我们集中分析FedFFTFT, 并证明它不是PAC的识别能力,因此对基于FL的培训更具吸引力。

相关内容

联邦学习

关注 199

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。