Estimation of causal effects using machine learning methods has become an active research field in econometrics. In this paper, we study the finite sample performance of meta-learners for estimation of heterogeneous treatment effects under the usage of sample-splitting and cross-fitting to reduce the overfitting bias. In both synthetic and semi-synthetic simulations we find that the performance of the meta-learners in finite samples greatly depends on the estimation procedure. The results imply that sample-splitting and cross-fitting are beneficial in large samples for bias reduction and efficiency of the meta-learners, respectively, whereas full-sample estimation is preferable in small samples. Furthermore, we derive practical recommendations for application of specific meta-learners in empirical studies depending on particular data characteristics such as treatment shares and sample size.
翻译:使用机器学习方法估计因果关系已成为计量经济学方面一个积极的研究领域。在本文件中,我们研究了利用采样分拆和交叉配置来估计不同处理效果,以估计采样和交叉配对所产生的不同处理效应的有限样本性能。在合成和半合成模拟中,我们发现在有限采样中,采样和交叉采样的性能在很大程度上取决于估计程序。结果表明,采样分拆和交叉采样分别有益于大型采样,有利于减少采样中单体的偏差和效率,而在小型采样中,则更可取的是全面采样估计。此外,我们根据治疗份额和采样大小等特定数据特征,为在实验性研究中应用具体的采样性采样和半合成的采样性能提出了切实可行的建议。