In this paper we propose and study a class of nonparametric, yet interpretable measures of association between two random vectors $X$ and $Y$ taking values in $\mathbb{R}^{d_1}$ and $\mathbb{R}^{d_2}$ respectively ($d_1, d_2\ge 1$). These nonparametric measures -- defined using the theory of reproducing kernel Hilbert spaces coupled with optimal transport -- capture the strength of dependence between $X$ and $Y$ and have the property that they are 0 if and only if the variables are independent and 1 if and only if one variable is a measurable function of the other. Further, these population measures can be consistently estimated using the general framework of geometric graphs which include $k$-nearest neighbor graphs and minimum spanning trees. Additionally, these measures can also be readily used to construct an exact finite sample distribution-free test of mutual independence between $X$ and $Y$. In fact, as far as we are aware, these are the only procedures that possess all the above mentioned desirable properties. The correlation coefficient proposed in Dette et al. (2013), Chatterjee (2021), Azadkia and Chatterjee (2021), at the population level, can be seen as a special case of this general class of measures.
翻译:暂无翻译