Data sets in which measurements of different types are obtained from a common set of samples appear in many scientific applications. In the analysis of such data, an important problem is to identify groups of features from different data types that are strongly associated. Given two data types, a bimodule is a pair $(A,B)$ of feature sets from the two types such that the aggregate cross-correlation between the features in $A$ and those in $B$ is large. A bimodule $(A,B)$ is stable if $A$ coincides with the set of features that have significant aggregate correlation with the features in $B$, and vice-versa. We develop an, iterative, testing-based procedure called BSP to identify stable bimodules. BSP relies on approximate p-values derived from the permutation moments of sums of squared sample correlations between a single feature of one type and a group of features of the second type. We carry out a thorough simulation study to assess the performance of BSP, and present an extended application to the problem of expression quantitative trait loci (eQTL) analysis using recent data from the GTEx project. In addition, we apply BSP to climatology data to identify regions in North America where annual temperature variation affects precipitation.
翻译:在许多科学应用中,从一套共同的样本中得出不同类型测量的数据集在许多科学应用中出现。在分析这类数据时,一个重要问题是确定不同数据类型中具有密切关联性的各种特征组。在两种数据类型中,双模量是两种类型的一对(A,B)地谱组的一对(A,B)美元,使以美元计算的特征和以美元计的特征之间的总和交叉关系非常大。如果双模量(A,B)美元与与美元和反差具有重大综合相关性的特征组相符合,则双模量(A,B)美元是稳定的。我们开发了一个反复的、基于测试的程序,称为BSP,以确定稳定的双模量组。BSP依靠从一种类型单一特征和第二类特征组之间平方样关联的数值的相交点时段得出的大约p-价值。我们进行了彻底的模拟研究,以评估BSP的性能,并提出了在使用最新数据对北美的温度变化进行我们GTA-Ex数据分析。