The advent of the big data era brought new opportunities and challenges to draw treatment effect in data fusion, that is, a mixed dataset collected from multiple sources (each source with an independent treatment assignment mechanism). Due to possibly omitted source labels and unmeasured confounders, traditional methods cannot estimate individual treatment assignment probability and infer treatment effect effectively. Therefore, we propose to reconstruct the source label and model it as a Group Instrumental Variable (GIV) to implement IV-based Regression for treatment effect estimation. In this paper, we conceptualize this line of thought and develop a unified framework (Meta-EM) to (1) map the raw data into a representation space to construct Linear Mixed Models for the assigned treatment variable; (2) estimate the distribution differences and model the GIV for the different treatment assignment mechanisms; and (3) adopt an alternating training strategy to iteratively optimize the representations and the joint distribution to model GIV for IV regression. Empirical results demonstrate the advantages of our Meta-EM compared with state-of-the-art methods.
翻译:大数据时代的到来带来了新的机会和挑战,在数据融合中产生处理效果,即从多种来源(每个来源都有一个独立的处理分配机制)收集的混合数据集。由于可能省略了源标签和未测量的混乱者,传统方法无法有效地估计个人治疗分配概率和推论处理效果。因此,我们提议将源标签和模型重新构建为一组工具变量,以实施基于四的治疗效果估计回归模型。在本文件中,我们构思了这一思路,并开发了一个统一框架(Meta-EM),以便(1) 将原始数据绘制成一个代表空间,以构建指定治疗变量的线性混合模型;(2) 估计分配差异和不同治疗分配机制的通用综合模型;(3) 采用交替培训战略,以迭代优化表述方式,并联合分配为四类回归模型的GIV。Empical结果表明,我们的Met-EM与最新方法相比,具有优势。