Randomized controlled trials are commonly regarded as the gold standard for causal inference and play a pivotal role in modern evidence-based medicine. However, the sample sizes they use are often too limited to draw significant causal conclusions for subgroups that are less prevalent in the population. In contrast, observational data are becoming increasingly accessible in large volumes but can be subject to bias as a result of hidden confounding. Given these complementary features, we propose a power likelihood approach to augmenting RCTs with observational data for robust estimation of heterogeneous treatment effects. We provide a data-adaptive procedure for maximizing the Expected Log Predictive Density (ELPD) to select the influence factor that best regulates the information from the observational data. We conduct a simulation study to illustrate the efficacy of our method and its favourable features compared to existing approaches. Lastly, we apply the proposed method to data from Tennessee's Student Teacher Achievement Ratio (STAR) Study to demonstrate its usefulness and practicality in real-world data analysis.
翻译:多数据:通过功率似然结合实验和观测数据
Translated abstract:
随机对照试验通常被认为是因果推断的黄金标准,在现代循证医学中发挥着关键作用。然而,它们使用的样本大小往往太小,不能为在人群中不太普遍的亚组绘制显着的因果结论。相比之下,观察数据正在以越来越大的体积变得越来越易于获取,但由于存在隐藏的混淆而可能存在偏差。鉴于这些补充的特性,我们提出了一种功率似然方法,通过增加观测数据来补充实验数据,以稳健地估计异质性治疗效应。我们提供了一种数据自适应过程,通过最大化期望对数预测密度(ELPD)来选择最佳调节影响因素,以控制来自观测数据的信息。我们进行模拟研究以说明我们的方法及其与现有方法相比的优点。最后,我们将所提出的方法应用于来自田纳西州学生教师成就比率(STAR)研究的数据,以证明其在实际数据分析中的实用性和实用性。