Estimating the causal dose-response function is challenging, particularly when data from a single source are insufficient to estimate responses precisely across all exposure levels. To overcome this limitation, we propose a data fusion framework that leverages multiple data sources that are partially aligned with the target distribution. Specifically, we derive a Neyman-orthogonal loss function tailored for estimating the dose-response function within data fusion settings. To improve computational efficiency, we propose a stochastic approximation that retains orthogonality. We apply kernel ridge regression with this approximation, which provides closed-form estimators. Our theoretical analysis demonstrates that incorporating additional data sources yields tighter finite-sample regret bounds and improved worst-case performance, as confirmed via minimax lower bound comparison. Simulation studies validate the practical advantages of our approach, showing improved estimation accuracy when employing data fusion. This study highlights the potential of data fusion for estimating non-smooth parameters such as causal dose-response functions.
翻译:估计因果剂量-响应函数具有挑战性,尤其是在单一数据源不足以精确估计所有暴露水平下的响应时。为克服这一局限,我们提出一种数据融合框架,该框架利用与目标分布部分对齐的多个数据源。具体而言,我们推导出一种专为数据融合环境中估计剂量-响应函数而设计的Neyman正交损失函数。为提高计算效率,我们提出一种保持正交性的随机逼近方法。我们将核岭回归与此逼近方法结合,从而得到闭式估计量。理论分析表明,纳入额外数据源可获得更紧的有限样本遗憾界并改善最坏情况性能,这一点通过极小极大下界比较得以验证。模拟研究证实了所提方法的实际优势,显示采用数据融合时估计精度得到提升。本研究凸显了数据融合在估计如因果剂量-响应函数这类非光滑参数方面的潜力。