The amount of biomedical data continues to grow rapidly. However, the ability to collect data from multiple sites for joint analysis remains challenging due to security, privacy, and regulatory concerns. We present a Secure Federated Learning architecture, MetisFL, which enables distributed training of neural networks over multiple data sources without sharing data. Each site trains the neural network over its private data for some time, then shares the neural network parameters (i.e., weights, gradients) with a Federation Controller, which in turn aggregates the local models, sends the resulting community model back to each site, and the process repeats. Our architecture provides strong security and privacy. First, sample data never leaves a site. Second, neural parameters are encrypted before transmission and the community model is computed under fully-homomorphic encryption. Finally, we use information-theoretic methods to limit information leakage from the neural model to prevent a curious site from performing membership attacks. We demonstrate this architecture in neuroimaging. Specifically, we investigate training neural models to classify Alzheimer's disease, and estimate Brain Age, from magnetic resonance imaging datasets distributed across multiple sites, including heterogeneous environments where sites have different amounts of data, statistical distributions, and computational capabilities.
翻译:生物医学数据的数量继续迅速增长,然而,由于安全、隐私和监管方面的关切,从多个地点收集数据以供联合分析的能力仍然具有挑战性。我们提出了一个安全联邦学习结构,MetisFl, 它可以通过多种数据源对神经网络进行分布式培训,而无需共享数据。每个地点都用其私人数据对神经网络进行一段时间的培训,然后与联邦主计长共享神经网络参数(即重量、梯度),然后与联邦主计长共享神经网络参数(即重量、梯度),后者反过来将当地模型汇总,将由此产生的社区模型发送到每个地点,程序重复。我们的结构提供强大的安全和隐私。首先,抽样数据从不离开一个地点。第二,在传输前对神经参数进行加密,在完全无色加密的情况下计算社区模型。最后,我们使用信息理论方法限制神经模型的信息泄漏,以防止奇特网站进行加入攻击。我们在神经成像中展示这一结构。我们调查神经模型,以将阿尔茨海默氏病分类,并从磁再感应成像数据集中估算脑年龄,从多个地点分布的磁再分配,包括不同数据量环境的分布。