While developing artificial intelligence (AI)-based algorithms to solve problems, the amount of data plays a pivotal role - large amount of data helps the researchers and engineers to develop robust AI algorithms. In the case of building AI-based models for problems related to medical imaging, these data need to be transferred from the medical institutions where they were acquired to the organizations developing the algorithms. This movement of data involves time-consuming formalities like complying with HIPAA, GDPR, etc.There is also a risk of patients' private data getting leaked, compromising their confidentiality. One solution to these problems is using the Federated Learning framework. Federated Learning (FL) helps AI models to generalize better and create a robust AI model by using data from different sources having different distributions and data characteristics without moving all the data to a central server. In our paper, we apply the FL framework for training a deep learning model to solve a binary classification problem of predicting the presence or absence of COVID-19. We took three different sources of data and trained individual models on each source. Then we trained an FL model on the complete data and compared all the model performances. We demonstrated that the FL model performs better than the individual models. Moreover, the FL model performed at par with the model trained on all the data combined at a central server. Thus Federated Learning leads to generalized AI models without the cost of data transfer and regulatory overhead.
翻译:在开发人工智能(AI)算法以解决问题的同时,数据数量也起着关键作用——大量数据帮助研究人员和工程师开发强有力的AI算法。在建立基于AI的医学成像问题模型方面,这些数据需要从获得这些数据的医疗机构转移到制定算法的组织。这种数据流动涉及耗时的手续,如遵守HIPAA、GDPR等。还存在病人私人数据泄漏的风险,损害其保密性。这些问题的一个解决办法是使用联邦学习框架。联邦学习(FL)帮助AI模型通过使用不同来源的数据(其分布和数据特点不同,而不将所有数据转移到中央服务器),更好地推广并创建强有力的AI模型。我们用FL框架培训一个深层次学习模型,解决预测COVID-19的存在或不存在的二元分类问题。我们从三个不同的数据源和对每个来源的单个模型进行了培训。然后,我们用FL模型培训了一个完整的数据模型,并将所有模型的中央服务器的运行情况都比FL更好地进行了。我们用FL测试的单个模型演示了FL。