We propose a distributed quadratic inference function framework to jointly estimate regression parameters from multiple potentially heterogeneous data sources with correlated vector outcomes. The primary goal of this joint integrative analysis is to estimate covariate effects on all outcomes through a marginal regression model in a statistically and computationally efficient way. We develop a data integration procedure for statistical estimation and inference of regression parameters that is implemented in a fully distributed and parallelized computational scheme. To overcome computational and modeling challenges arising from the high-dimensional likelihood of the correlated vector outcomes, we propose to analyze each data source using Qu, Lindsay and Li (2000)'s quadratic inference functions, and then to jointly reestimate parameters from each data source by accounting for correlation between data sources using a combined meta-estimator in a similar spirit to Hansen (1982)'s generalised method of moments. We show both theoretically and numerically that the proposed method yields efficiency improvements and is computationally fast. We illustrate the proposed methodology with the joint integrative analysis of the association between smoking and metabolites in a large multi-cohort study and provide an R package for ease of implementation.
翻译:我们提出一个分布式二次推断功能框架,以共同估计来自多个潜在不同数据源的回归参数,并附带相关的矢量结果。这一联合综合分析的主要目标是,以统计和计算效率的方式,通过边际回归模型,估计对所有结果的共变效应。我们开发了一个数据集成程序,用于统计估计和推导回归参数,在完全分布式和平行的计算方法中实施。为了克服相关矢量结果的高度可能性所产生的计算和建模挑战,我们提议利用Quu、Lindsay和Li(2000)的二次推断函数,分析每个数据源的回归参数,然后共同重新估算每个数据源的参数,为此,以汉森(1982年)的通用时钟方法类似的精神,对数据源之间的关联进行核算。我们从理论上和数字上表明,拟议方法提高了效率,而且正在快速进行计算。我们提议采用的方法,在一项大型多孔研究中对吸烟和代谢物之间的联系进行联合综合分析,并提供便于执行的R包。