Survival analysis, time-to-event analysis, is an important problem in healthcare since it has a wide-ranging impact on patients and palliative care. Many survival analysis methods have assumed that the survival data is centrally available either from one medical center or by data sharing from multi-centers. However, the sensitivity of the patient attributes and the strict privacy laws have increasingly forbidden sharing of healthcare data. To address this challenge, the research community has looked at the solution of decentralized training and sharing of model parameters using the Federated Learning (FL) paradigm. In this paper, we study the utilization of FL for performing survival analysis on distributed healthcare datasets. Recently, the popular Cox proportional hazard (CPH) models have been adapted for FL settings; however, due to its linearity and proportional hazards assumptions, CPH models result in suboptimal performance, especially for non-linear, non-iid, and heavily censored survival datasets. To overcome the challenges of existing federated survival analysis methods, we leverage the predictive accuracy of the deep learning models and the power of pseudo values to propose a first-of-its-kind, pseudo value-based deep learning model for federated survival analysis (FSA) called FedPseudo. Furthermore, we introduce a novel approach of deriving pseudo values for survival probability in the FL settings that speeds up the computation of pseudo values. Extensive experiments on synthetic and real-world datasets show that our pseudo valued-based FL framework achieves similar performance as the best centrally trained deep survival analysis model. Moreover, our proposed FL approach obtains the best results for various censoring settings.
翻译:生存分析,即时间到活动分析,是保健领域的一个重要问题,因为它对病人和姑息护理具有广泛影响。许多生存分析方法假定生存数据由一个医疗中心或多中心的数据共享集中提供。然而,病人特性的敏感性和严格的隐私法越来越禁止分享卫生保健数据。为了应对这一挑战,研究界研究了采用联邦学习联合会(FL)模式分散培训和共享模型参数的解决方案。我们在本文件中研究了利用FL进行分布式保健数据集生存分析的利用情况。最近,流行的Cox比例危害(CPH)模型已经适应了FL环境;然而,由于该模型的内线性和比例危害假设,CPH模型导致不优化的性能,特别是非线性、非二型和严格审查的存活数据集。为了克服现有Freederal化模型求生存分析方法的挑战,我们利用了各种深层次学习模型的预测准确性和假数值的能量来提出FL模型的首类、伪价的精确度危害(CP-经过培训的CH)模型,用于FFFL模型的精确度模型,我们用来进行最佳生存分析。我们最新的FFFFL模型的模型的模型。我们为FFFFFFI的模型的模型的模型的模型的模型的模型,用来进行最佳的模型的模型的模型的模型的模型的模型的模型的模型。