Artificial intelligence (AI) methods are revolutionizing medical image analysis. However, robust AI models require large multi-site datasets for training. While multiple stakeholders have provided publicly available datasets, the ways in which these data are labeled differ widely. For example, one dataset of chest radiographs might contain labels denoting the presence of metastases in the lung, while another dataset of chest radiograph might focus on the presence of pneumonia. With conventional approaches, these data cannot be used together to train a single AI model. We propose a new framework that we call flexible federated learning (FFL) for collaborative training on such data. Using publicly available data of 695,000 chest radiographs from five institutions - each with differing labels - we demonstrate that large and heterogeneously labeled datasets can be used to train one big AI model with this framework. We find that models trained with FFL are superior to models that are trained on matching annotations only. This may pave the way for training of truly large-scale AI models that make efficient use of all existing data.
翻译:人工智能(AI)方法正在使医学图像分析发生革命性的变化。然而,强健的人工智能模型需要大量的多站数据集用于培训。虽然多个利益攸关方提供了可公开获取的数据集,但这些数据的标签方式差异很大。例如,胸部射线仪的数据集可能含有标明肺部存在转移的标签,而胸部射线仪的另一数据集可能侧重于肺炎的存在。用常规方法,这些数据无法一起用于培训单一的人工智能模型。我们建议了一个新的框架,即我们称之为灵活联合学习(FFL),用于就这些数据进行协作培训。我们利用五个机构(每个机构都有不同标签)695 000个胸透镜的公开数据,表明可以使用大型和多层标签的数据集来培训一个使用这个框架的大型人工智能模型。我们发现,接受FFFLL培训的模型优于仅进行匹配说明培训的模型。这可能会为培训真正大型的人工智能模型铺平道路,以便有效地使用所有现有数据。