Background: AI-based analysis of sufficiently large, curated medical datasets has been shown to be promising for providing early detection, faster diagnosis, better decision-making, and more effective treatment. However, accessing such highly confidential and very sensitive medical data, obtained from a variety of sources, is usually highly restricted since improper use, unsafe storage, data leakage or abuse could violate a person's privacy. In this work we apply a federated learning paradigm over a heterogeneous, siloed sets of high-definition electrocardiogram arriving from 12-leads ECG sensors arrays to train AI models. We evaluated the capacity of the resulting models to achieve equivalent performance when compared to state-of-the-art models trained when the same data is collected in a central place. Methods: We propose a privacy preserving methodology for training AI models based on the federated learning paradigm over a heterogeneous, distributed, dataset. The methodology is applied to a broad range of machine learning techniques based on gradient boosting, convolutional neural network and recurrent neural networks with long short-term memory. The models were trained over a ECG dataset containing 12-leads recordings collected from 43,059 patients from six geographically separate and heterogeneous sources. Findings: The resulting set of AI models for detecting cardiovascular abnormalities achieved comparable predictive performances against models trained using a centralised learning approach. Interpretation: The approach of compute parameters contributing to the global model locally and then exchange only such parameters instead of the whole sensitive data as in ML contributes to preserve medical data privacy.
翻译:在这项工作中,我们采用了一种混合式、不安全的储存、数据泄漏或滥用的学习模式。我们采用了一种混合式的混合式学习模式,从12个领先式ECG传感器阵列收集的高清晰型电子心电图,以培训AI模型。我们评估了所产生的模型相对于在中央收集同一数据时所培训的最先进的模型而言达到同等性能的能力。方法:我们提议了一种隐私保护方法,用于培训基于不同、分布式和数据集的联邦式学习模式的AI模型。这种方法适用于基于梯度增强、模型神经网络和具有长期记忆的经常性神经网络的多种不同结构学习技术。我们用ECG数据集进行了培训,其中仅包含在中央收集同一数据时所培训的最先进的参数。我们建议了一种隐私保护方法,用于培训基于混合式学习模式的AI模型,用于培训基于不同结构、分布式神经网络和具有长期记忆的经常性神经网络。这些模型仅包含从43个类中采集的12个先导值记录,而用于对可比较性能分析的ASL 6个核心数据模型:通过经过训练的AR变的模型,用于对可变的ASy ASyal ASyalalal 。