For the task of relevance analysis, the conventional Tukey's test may be applied to the set of all pairwise comparisons. However, there were few studies that discuss both nonparametric k-sample comparisons and relevance analysis in high dimensions. Our aim is to capture the degree of relevance between combined samples and provide additional insights and advantages in high-dimensional k-sample comparisons. Our solution is to extend a graph-based two-sample comparison and investigate its availability for large and unequal sample sizes. We propose two distribution-free test statistics based on between-sample edge counts and measure the degree of relevance by standardized counts. The asymptotic permutation null distributions of the proposed statistics are derived, and the power gain is proved when the sample sizes are smaller than the square root of the dimension. We also discuss different edge costs in the graph to compare the parameters of the distributions. Simulation comparisons and real data analysis of tumors and images further convince the value of our proposed method. Software implementing the relevance analysis is available in the R package Relevance.
翻译:就相关性分析的任务而言,传统的Tukey的测试可适用于所有对称比较,然而,很少有研究讨论非参数K抽样比较和高维相关性分析。我们的目的是捕捉综合样品之间的关联程度,在高维K抽样比较中提供更多的洞察力和优势。我们的解决办法是扩大基于图形的双抽样比较,并调查其是否可用于大而不平等的抽样大小。我们建议根据抽样边缘之间的数量和标准化数量衡量相关性程度,进行两次无分布式测试统计。拟议的统计数据的无症状任意分布得到推导,当样品大小小于维度的平方根时,功率增益得到证明。我们还讨论图中的不同边端成本,以比较分布参数。肿瘤和图像的模拟比较和实际数据分析进一步说服了我们拟议方法的价值。R包的相关性分析软件可用。