基于标签集不匹配的医学图像分类中的规模化联邦学习 (Scale Federated Learning for Label Set Mismatch in Medical Image Classification)

Federated learning (FL) has been introduced to the healthcare domain as a decentralized learning paradigm that allows multiple parties to train a model collaboratively without privacy leakage. However, most previous studies have assumed that every client holds an identical label set. In reality, medical specialists tend to annotate only diseases within their knowledge domain or interest. This implies that label sets in each client can be different and even disjoint. In this paper, we propose the framework FedLSM to solve the problem Label Set Mismatch. FedLSM adopts different training strategies on data with different uncertainty levels to efficiently utilize unlabeled or partially labeled data as well as class-wise adaptive aggregation in the classification layer to avoid inaccurate aggregation when clients have missing labels. We evaluate FedLSM on two public real-world medical image datasets, including chest x-ray (CXR) diagnosis with 112,120 CXR images and skin lesion diagnosis with 10,015 dermoscopy images, and show that it significantly outperforms other state-of-the-art FL algorithms. Code will be made available upon acceptance.

翻译：联邦学习是一种去中心化学习范式，可以让多个参与方在不泄露隐私的情况下协同训练模型。然而，以往的研究大多假设每个客户端都持有相同的标签集。实际上，医学专家往往只会注释他们所熟悉或感兴趣的疾病。这意味着每个客户端的标签集可能不同，甚至是不相交的。本文提出了一种名为FedLSM的框架，用于解决标签集不匹配的问题。FedLSM采用不同的训练策略来有效利用未标记或部分标记的数据，同时在分类层中采用逐类自适应聚合来避免在客户端存在缺失标签时出现不准确的聚合。我们在两个公共的真实医学图像数据集上评估了FedLSM，包括拥有112,120个胸部 X 光图像的胸部 X 光诊断和拥有10,015个皮肤镜图像的皮肤病变诊断，结果显示它明显优于其他最先进的联邦学习算法。代码将在接受后提供。

相关内容

联邦学习

关注 200

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。