Identifying predictive factors via multivariable statistical analysis is for rare diseases often impossible because the data sets available are too small. Combining data from different medical centers into a single (larger) database would alleviate this problem, but is in practice challenging due to regulatory and logistic problems. Federated Learning (FL) is a machine learning approach that aims to construct from local inferences in separate data centers what would have been inferred had the data sets been merged. It seeks to harvest the statistical power of larger data sets without actually creating them. The FL strategy is not always feasible for small data sets. Therefore, in this paper we refine and implement an alternative Bayesian Federated Inference (BFI) framework for multi center data with the same aim as FL. The BFI framework is designed to cope with small data sets by inferring locally not only the optimal parameter values, but also additional features of the posterior parameter distribution, capturing information beyond that is used in FL. BFI has the additional benefit that a single inference cycle across the centers is sufficient, whereas FL needs multiple cycles. We quantify the performance of the proposed methodology on simulated and real life data.
翻译:通过多变量统计分析确定预测因素对于罕见疾病来说往往是不可能的,因为现有的数据集太小。将不同医疗中心的数据合并成一个单一(大)数据库可以缓解这一问题,但实际上由于监管和后勤问题而具有挑战性。联邦学习(FL)是一种机器学习方法,目的是在单独的数据中心从地方推论中得出如果将数据集合并,将会推断出哪些因素。它试图获取较大数据集的统计能力,而不会实际生成这些数据集。FL战略对于小数据集并不总是可行。因此,在本文件中,我们改进并采用与FL相同的多中心数据的替代巴伊西亚联邦推断框架。BFI框架旨在应对小数据集,不仅从当地推断最佳参数值,而且从外表参数分布的额外特征中推断出哪些因素,从而捕捉到超出FL所用范围的信息。BFI具有额外好处,即整个中心单一推论周期已经足够,而FL需要多个周期。我们量化了模拟和实际生命数据的拟议方法的绩效。