The amount of biomedical data continues to grow rapidly. However, the ability to analyze these data is limited due to privacy and regulatory concerns. Machine learning approaches that require data to be copied to a single location are hampered by the challenges of data sharing. Federated Learning is a promising approach to learn a joint model over data silos. This architecture does not share any subject data across sites, only aggregated parameters, often in encrypted environments, thus satisfying privacy and regulatory requirements. Here, we describe our Federated Learning architecture and training policies. We demonstrate our approach on a brain age prediction model on structural MRI scans distributed across multiple sites with diverse amounts of data and subject (age) distributions. In these heterogeneous environments, our Semi-Synchronous protocol provides faster convergence.
翻译:生物医学数据的数量继续迅速增长,然而,由于隐私和监管问题,分析这些数据的能力有限。要求将数据复制到单一地点的机械学习方法受到数据共享挑战的阻碍。联邦学习是学习数据筒仓联合模型的有希望的方法。这一结构不共享不同地点的任何主题数据,仅共享总参数,通常在加密环境中,从而满足隐私和监管要求。这里,我们描述我们的联邦学习架构和培训政策。我们展示了我们对于分布于多个地点、有不同数量的数据和主题(年龄)分布的结构性磁共振仪扫描的大脑年龄预测模型的做法。在这些不同环境中,我们的半同步协议提供了更快的趋同。