The development of technologies for causal inference with the privacy preservation of distributed data has attracted considerable attention in recent years. To address this issue, we propose a data collaboration quasi-experiment (DC-QE) that enables causal inference from distributed data with privacy preservation. In our method, first, local parties construct dimensionality-reduced intermediate representations from the private data. Second, they share intermediate representations, instead of private data for privacy preservation. Third, propensity scores were estimated from the shared intermediate representations. Finally, the treatment effects were estimated from propensity scores. Our method can reduce both random errors and biases, whereas existing methods can only reduce random errors in the estimation of treatment effects. Through numerical experiments on both artificial and real-world data, we confirmed that our method can lead to better estimation results than individual analyses. Dimensionality-reduction loses some of the information in the private data and causes performance degradation. However, we observed that in the experiments, sharing intermediate representations with many parties to resolve the lack of subjects and covariates, our method improved performance enough to overcome the degradation caused by dimensionality-reduction. With the spread of our method, intermediate representations can be published as open data to help researchers find causalities and accumulated as a knowledge base.
翻译:暂无翻译