The categorical Gini correlation proposed by Dang et al. is a dependence measure to characterize independence between categorical and numerical variables. The asymptotic distributions of the sample correlation under dependence and independence have been established when the dimension of the numerical variable is fixed. However, its asymptotic behavior for high dimensional data has not been explored. In this paper, we develop the central limit theorem for the Gini correlation in the more realistic setting where the dimensionality of the numerical variable is diverging. We then construct a powerful and consistent test for the $K$-sample problem based on the asymptotic normality. The proposed test not only avoids computation burden but also gains power over the permutation procedure. Simulation studies and real data illustrations show that the proposed test is more competitive to existing methods across a broad range of realistic situations, especially in unbalanced cases.
翻译:----
本文中提出的分类Gini相关性是一种度量分类变量和数值变量之间独立性的方法。已经在数值变量维度固定的情况下,建立了采样相关系数在相关和独立条件下的渐近分布,但在高维数据情形下,它的渐近性质尚未被探究。在本文中,我们针对维度数量递增的数字变量,开发了Gini相关性的中心极限定理。然后,基于这种渐近正态性,我们构建了一个强大且一致性的测试方法来解决K样本问题。所提出的测试方法不仅避免了基于置换程序的计算负担,而且在各种实际情况下都具有更高的功效。模拟研究和真实数据的案例表明,所提出的测试方法在广泛的实际情况下更具竞争力,尤其是在非平衡情况下。