Distributed statistical analyses provide a promising approach for privacy protection when analysing data distributed over several databases. It brings the analysis to the data and not the data to the analysis. The analyst receives anonymous summary statistics which are combined to a aggregated result. We are interested to calculate the AUC of a prediction score based on a distributed approach without getting to know the data of involved individual subjects distributed over different databases. We use DataSHIELD as the technology to carry out distributed analyses and use a newly developed algorithms to perform the validation of the prediction score. Calibration can easily be implemented in the distributed setting. But, discrimination represented by a respective ROC curve and its AUC is challenging. We base our approach on the ROC-GLM algorithm as well as on ideas of differential privacy. The proposed algorithms are evaluated in a simulation study. A real-word application is described: The audit use case of DIFUTURE (Medical Informatics Initiative) with the goal to validate a treatment prediction rule of patients with newly diagnosed multiple sclerosis.
翻译:分布式统计分析在分析通过几个数据库传播的数据时,为隐私保护提供了一种很有希望的方法,它将分析带给数据,而不是数据,分析员收到匿名摘要统计,这些统计与综合结果相结合。我们有兴趣根据分布式方法计算AUC的预测分数,而不了解在不同数据库中分布的有关个别主题的数据。我们使用DataSHIELD作为进行分布式分析和使用新开发的算法进行预测分数验证的技术。在分布式环境中很容易进行校正。但是,由相应的ROC曲线和AUC代表的歧视具有挑战性。我们的方法以ROC-GLM算法和差异性隐私概念为基础。在模拟研究中评估了拟议的算法。描述了一个实际应用:审计使用DIFUTURE(医疗信息倡议)的案例,目的是验证新诊断多发性硬质病患者的治疗预测规则。</s>