Multimodal data, where different types of data are collected from the same subjects, are fast emerging in a large variety of scientific applications. Factor analysis is commonly used in integrative analysis of multimodal data, and is particularly useful to overcome the curse of high dimensionality and high correlations. However, there is little work on statistical inference for factor analysis based supervised modeling of multimodal data. In this article, we consider an integrative linear regression model that is built upon the latent factors extracted from multimodal data. We address three important questions: how to infer the significance of one data modality given the other modalities in the model; how to infer the significance of a combination of variables from one modality or across different modalities; and how to quantify the contribution, measured by the goodness-of-fit, of one data modality given the others. When answering each question, we explicitly characterize both the benefit and the extra cost of factor analysis. Those questions, to our knowledge, have not yet been addressed despite wide use of factor analysis in integrative multimodal analysis, and our proposal bridges an important gap. We study the empirical performance of our methods through simulations, and further illustrate with a multimodal neuroimaging analysis.
翻译:从同一主题收集不同种类的数据的多模式数据在大量科学应用中迅速出现,在多种不同的科学应用中迅速出现,在综合分析多式联运数据时通常使用要素分析,对于克服高维度和高度关联的诅咒特别有用,然而,在统计推论方面,没有做多少工作,以基于要素分析的、监督的多式联运数据模型为基础,在本条中,我们认为基于从多式联运数据中提取的潜在因素的综合线性线性回归模型。我们讨论了三个重要问题:如何根据模型中的其他模式推断一种数据模式的重要性;如何推断一种模式或不同模式变数组合的重要性;如何用一种数据模式的优异性衡量,如何量化其他数据模式的贡献。我们在回答每个问题时,明确描述要素分析的好处和额外费用。据我们所知,尽管在综合多式联运分析中广泛使用要素分析,但这些问题尚未得到解决,而且我们的提案弥补了一个重要的差距。我们通过模拟研究我们方法的经验表现,并用多式联运神经成型分析进一步说明。