The application of Machine Learning (ML) to the diagnosis of rare diseases, such as collagen VI-related dystrophies (COL6-RD), is fundamentally limited by the scarcity and fragmentation of available data. Attempts to expand sampling across hospitals, institutions, or countries with differing regulations face severe privacy, regulatory, and logistical obstacles that are often difficult to overcome. The Federated Learning (FL) provides a promising solution by enabling collaborative model training across decentralized datasets while keeping patient data local and private. Here, we report a novel global FL initiative using the Sherpa.ai FL platform, which leverages FL across distributed datasets in two international organizations for the diagnosis of COL6-RD, using collagen VI immunofluorescence microscopy images from patient-derived fibroblast cultures. Our solution resulted in an ML model capable of classifying collagen VI patient images into the three primary pathogenic mechanism groups associated with COL6-RD: exon skipping, glycine substitution, and pseudoexon insertion. This new approach achieved an F1-score of 0.82, outperforming single-organization models (0.57-0.75). These results demonstrate that FL substantially improves diagnostic utility and generalizability compared to isolated institutional models. Beyond enabling more accurate diagnosis, we anticipate that this approach will support the interpretation of variants of uncertain significance and guide the prioritization of sequencing strategies to identify novel pathogenic variants.
翻译:机器学习(ML)在罕见病(如胶原蛋白VI相关肌营养不良,COL6-RD)诊断中的应用,从根本上受到数据稀缺性和分散性的限制。试图跨越医院、机构或国家扩展样本采集时,常面临因法规差异导致的严重隐私、监管和物流障碍,这些障碍往往难以克服。联邦学习(FL)通过支持在分散数据集上进行协同模型训练,同时保持患者数据本地化和隐私性,提供了一种前景广阔的解决方案。本文报告了一项基于Sherpa.ai FL平台的新型全球FL计划,该计划利用来自患者来源成纤维细胞培养物的胶原蛋白VI免疫荧光显微镜图像,在两个国际组织的分布式数据集上实施FL,用于COL6-RD的诊断。我们的方案构建了一个能够将胶原蛋白VI患者图像分类为与COL6-RD相关的三种主要致病机制组(外显子跳跃、甘氨酸替代和伪外显子插入)的ML模型。这一新方法实现了0.82的F1分数,优于单一机构模型(0.57-0.75)。这些结果表明,与孤立机构模型相比,FL显著提升了诊断效用和泛化能力。除了实现更精确的诊断外,我们预期该方法将支持对意义未明变异的解读,并指导测序策略的优先排序,以识别新的致病变异。