Estimation of conditional average treatment effects (CATEs) is an important topic in sciences. CATEs can be estimated with high accuracy if distributed data across multiple parties can be centralized. However, it is difficult to aggregate such data owing to privacy concerns. To address this issue, we proposed data collaboration double machine learning, a method that can estimate CATE models with privacy preservation of distributed data, and evaluated the method through simulations. Our contributions are summarized in the following three points. First, our method enables estimation and testing of semi-parametric CATE models without iterative communication on distributed data. Semi-parametric CATE models enable estimation and testing that is more robust to model mis-specification than parametric models. Second, our method enables collaborative estimation between multiple time points and different parties. Third, our method performed equally or better than other methods in simulations using synthetic, semi-synthetic and real-world datasets.
翻译:暂无翻译