Data sets in which measurements of two (or more) types are obtained from a common set of samples arise in many scientific applications. A common problem in the exploratory analysis of such data is to identify groups of features of different data types that are strongly associated. A bimodule is a pair (A, B) of feature sets from two data types such that the aggregate cross-correlation between the features in A and those in B is large. A bimodule (A, B) is stable if A coincides with the set of features that have significant aggregate correlation with the features in B, and vice-versa. In this paper we propose and investigate an iterative testing-based procedure (BSP) to identify stable bimodules in bi-view data. We carry out a thorough simulation study to assess the performance of BSP, and present an extended application to the problem of expression quantitative trait loci (eQTL) analysis using recent data from the GTEx project. In addition, we apply BSP to climatology data to identify regions in North America where annual temperature variation affects precipitation.
翻译:在许多科学应用中,从一套共同的样本中得出两种(或更多)类型的测量数据组在许多科学应用中产生。对这些数据的探索性分析的一个共同问题是,查明不同数据类型具有密切关联性的一组特征。双模量是两种数据类型的一对特征组(A、B),这两类数据组使A和B的特征之间的综合交叉关系很大。如果双模量组(A、B)与A与与B和反向特征具有重要综合相关性的一组特征组特征组相吻合,则该双模量组(BSP)是稳定的。我们在本文件中提议并调查一种基于迭代测试的程序(BSP),以确定双视数据中稳定的双光数据中的双模量。我们进行了彻底的模拟研究,以评估双光量组的性能,并利用GTEx项目的最新数据,对表达量性能(eQTL)分析问题提出广泛的应用。此外,我们运用BSP气候学数据来确定北美每年温度变化影响降水的区域。