多来源多线多源多路数据的贝贝斯预测模型 (Bayesian predictive modeling of multi-source multi-way data)

We develop a Bayesian approach to predict a continuous or binary outcome from data that are collected from multiple sources with a multi-way (i.e.. multidimensional tensor) structure. As a motivating example we consider molecular data from multiple 'omics sources, each measured over multiple developmental time points, as predictors of early-life iron deficiency (ID) in a rhesus monkey model. We use a linear model with a low-rank structure on the coefficients to capture multi-way dependence and model the variance of the coefficients separately across each source to infer their relative contributions. Conjugate priors facilitate an efficient Gibbs sampling algorithm for posterior inference, assuming a continuous outcome with normal errors or a binary outcome with a probit link. Simulations demonstrate that our model performs as expected in terms of misclassification rates and correlation of estimated coefficients with true coefficients, with large gains in performance by incorporating multi-way structure and modest gains when accounting for differing signal sizes across the different sources. Moreover, it provides robust classification of ID monkeys for our motivating application. Software in the form of R code is available at https://github.com/BiostatsKim/BayesMSMW .

翻译：我们开发了一种巴伊西亚方法,以预测从多途径(即多层面)结构的多种来源收集的数据中得出的连续或二进结果。作为一个激励性的例子,我们考虑从多种“组群”来源收集的分子数据,每个数据都是通过多种发育时间点测量的,作为恒星猴子模型中早期生命缺铁(ID)的预测者。我们使用一种对系数结构低的线性模型,以捕捉多路依赖性,并用每个来源之间不同系数的差异来模型来推断各自的相对贡献。Conjugate 前面的先锋有助于对后方推断进行高效的Gib抽样算法,假设具有正常错误的连续结果,或带有准链接的二进制结果。模拟表明,我们的模型在错误分类率和估计系数与真实系数的关联方面表现如预期的那样,在计算不同来源不同的信号大小时,将多路结构和微增益值计算出来。此外,它为我们的激励应用程序提供了强有力的ID 猴子分类。以RKISM/MMS格式的 RK/Bs。