In the age of big data, data integration is a critical step especially in the understanding of how diverse data types work together and work separately. Among data integration methods, the Angle-Based Joint and Individual Variation Explained (AJIVE) approach is particularly attractive because it not only studies joint behavior but also individual behavior. Typically AJIVE scores indicate important relationships between data objects, such as clusters. An important challenge is understanding which features, i.e. variables, are associated with those relationships. This challenge is addressed by the proposal of a hypothesis test for assessing statistical significance of features. The new test is inspired by the related jackstraw method developed for Principal Component Analysis. We use a high-dimensional muti-genomic cancer data set as our strong motivation and deep illustration of the methodology.
翻译:在海量数据时代,数据整合是一个关键步骤,特别是在了解不同数据类型如何相互配合和分别工作方面。在数据整合方法中,基于角度的联合和个人变异解释(AJIVE)方法特别具有吸引力,因为它不仅研究共同行为,而且研究个人行为。典型的AJIVE分数表明数据对象(如集群)之间的重要关系。一个重要的挑战是了解哪些特征(即变量)与这些关系相关联。评估特征的统计意义的假设测试建议解决了这一挑战。新测试的灵感来自为主构件分析开发的相关粗略方法。我们使用高维的肌肉基因癌症数据作为我们方法的强大动力和深刻说明。