In social, medical, and behavioral research we often encounter datasets with a multilevel structure and multiple correlated dependent variables. These data are frequently collected from a study population that distinguishes several subpopulations with different (i.e., heterogeneous) effects of an intervention. Despite the frequent occurrence of such data, methods to analyze them are less common and researchers often resort to either ignoring the multilevel and/or heterogeneous structure, analyzing only a single dependent variable, or a combination of these. These analysis strategies are suboptimal: Ignoring multilevel structures inflates Type I error rates, while neglecting the multivariate or heterogeneous structure masks detailed insights. To analyze such data comprehensively, the current paper presents a novel Bayesian multilevel multivariate logistic regression model. The clustered structure of multilevel data is taken into account, such that posterior inferences can be made with accurate error rates. Further, the model shares information between different subpopulations in the estimation of average and conditional average multivariate treatment effects. To facilitate interpretation, multivariate logistic regression parameters are transformed to posterior success probabilities and differences between them. A numerical evaluation compared our framework to less comprehensive alternatives and highlighted the need to model the multilevel structure: Treatment comparisons based on the multilevel model had targeted Type I error rates, while single-level alternatives resulted in inflated Type I errors. A re-analysis of the Third International Stroke Trial data illustrated how incorporating a multilevel structure, assessing treatment heterogeneity, and combining dependent variables contributed to an in-depth understanding of treatment effects.
翻译:在社会、医学和行为研究中,我们经常遇到具有多级结构和多个相关因变量的数据集。这些数据通常来自区分具有不同(即异质性的)介入效果的几个亚群体的研究人口。尽管这些数据经常出现,但分析它们的方法较少,研究人员经常采用忽略多级和/或异质性结构、仅分析单个因变量或这两个因变量的组合的方法。这些分析策略是次优的:忽略多级结构会增加第一类错误率,而忽略多元或异质性结构则掩盖了详细的见解。为了全面分析这些数据,本文提出了一种新的贝叶斯多级多变量 logistic 回归模型。多级数据的集群结构得到考虑,从而可以通过准确的错误率进行后验推断。此外,在估计平均和条件平均多元治疗效应时,模型在不同的子群之间共享信息。为了便于解释,多元 logistic 回归参数被转换为后验成功概率和它们之间的差异。数值评估将我们的框架与不太全面的替代方案进行了比较,并突显了建模多级结构的必要性:基于多级模型的治疗比较具有目标第一类错误率,而单级替代方案导致第一类错误率增加。对第三次国际卒中试验数据的重新分析说明了如何结合回归模型的多级结构、评估治疗异质性和组合多个因变量,有助于深入了解治疗效果。