Rank-based approaches are among the most popular nonparametric methods for univariate data in tackling statistical problems such as hypothesis testing due to their robustness and effectiveness. However, they are unsatisfactory for more complex data. In the era of big data, high-dimensional and non-Euclidean data, such as networks and images, are ubiquitous and pose challenges for statistical analysis. Existing multivariate ranks such as component-wise ranks, spatial ranks, and depth-based ranks do not apply to non-Euclidean data and have limited performance for high-dimensional data. Instead of dealing with the ranks of observations, we propose two types of ranks applicable to complex data based on a similarity graph constructed on observations: a graph-induced rank defined by the inductive nature of the graph and an overall rank defined by the weight of edges in the graph. To illustrate their utilization, both the new ranks are used to construct test statistics for the two-sample hypothesis testing, which converge to the $\chi_2^2$ distribution under the permutation null distribution and some mild conditions of the ranks, enabling an easy type-I error control. Simulation studies show that the new method exhibits good power under a wide range of alternatives compared to existing methods. The new test is illustrated on the New York City taxi data for comparing travel patterns in consecutive months and a brain network dataset comparing male and female subjects.
翻译:以等级为基础的方法是处理统计问题的最受欢迎的非数据非参数性方法,如由于稳健和有效性而进行假设测试,但是,对于更复杂的数据来说,这些方法并不令人满意。在海数据时代,高维和非欧裔数据,例如网络和图像,是无处不在的,对统计分析构成挑战。现有的多变量等级,例如各组成部分的等级、空间等级和深层等级,不适用于非欧裔数据,而且高维数据的性能有限。我们建议采用两种等级,而不是处理观测等级。我们建议两种等级,适用于基于观测所建的类似图表的复杂数据:由图表的直观性质所定义的图表引发的等级,以及由图中边缘重量所定义的总体等级。为了说明它们的使用,两个新等级用于为两个抽样假设测试建立测试统计数据,这与不固定分布和一些温和条件的分布不相匹配。我们提议两种等级的类别适用于基于类似图表的复杂数据:一种由图表的缩略图所定义的图表引发的级别,一种由图表所定义的图表所定义的图表所定义的图表所定义的图表所定义的总位位数。为了对城市现有数据进行新的比较的顺序进行新的比较,在新的类型中,在新的类型中可以比较的轨道上进行新的性别比较。