Kaplan-Meier estimators capture the survival behavior of a cohort. They are one of the key statistics in survival analysis. As with any estimator, they become more accurate in presence of larger datasets. This motivates multiple data holders to share their data in order to calculate a more accurate Kaplan-Meier estimator. However, these survival datasets often contain sensitive information of individuals and it is the responsibility of the data holders to protect their data, thus a naive sharing of data is often not viable. In this work, we propose two novel differentially private schemes that are facilitated by our novel synthetic dataset generation method. Based on these scheme we propose various paths that allow a joint estimation of the Kaplan-Meier curves with strict privacy guarantees. Our contribution includes a taxonomy of methods for this task and an extensive experimental exploration and evaluation based on this structure. We show that we can construct a joint, global Kaplan-Meier estimator which satisfies very tight privacy guarantees and with no statistically-significant utility loss compared to the non-private centralized setting.
翻译:暂无翻译