Finding meaningful ways to measure the statistical dependency between random variables $\xi$ and $\zeta$ is a timeless statistical endeavor. In recent years, several novel concepts, like the distance covariance, have extended classical notions of dependency to more general settings. In this article, we propose and study an alternative framework that is based on optimal transport. The transport dependency $\tau \ge 0$ applies to general Polish spaces and intrinsically respects metric properties. For suitable ground costs, independence is fully characterized by $\tau = 0$. Via proper normalization of $\tau$, three transport correlations $\rho_\alpha$, $\rho_\infty$, and $\rho_*$ with values in $[0, 1]$ are defined. They attain the value $1$ if and only if $\zeta = \varphi(\xi)$, where $\varphi$ is an $\alpha$-Lipschitz function for $\rho_\alpha$, a measurable function for $\rho_\infty$, or a multiple of an isometry for $\rho_*$. The transport dependency can be estimated consistently by an empirical plug-in approach, but alternative estimators with the same convergence rate but significantly reduced computational costs are also proposed. Numerical results suggest that $\tau$ robustly recovers dependency between data sets with different internal metric structures. The usage for inferential tasks, like transport dependency based independence testing, is illustrated on a data set from a cancer study.
翻译:寻找测量随机变量 $\xi$ 和 $\zeta$ 之间统计依赖关系的有意义方法是一项永恒的统计学任务。近年来,几个新颖的概念,如距离协方差,已将经典的依赖性概念扩展到更一般的设置中。在本文中,我们提出和研究了一个基于最优输运的替代框架。运输依赖性 $\tau \ge 0$ 应用于一般的波兰空间,并内在地尊重度量性质。对于适当的地面成本,独立性被 $\tau = 0$ 完全描述。通过 $\tau$ 的适当归一化,定义了三种运输相关性 $\rho_\alpha$,$\rho_\infty$ 和 $\rho_*$,其取值在 $[0,1]$ 中。它们仅当 $\zeta = \varphi(\xi)$ 时取值为 $1$,其中 $\varphi$ 是 $\rho_\alpha$ 的 $\alpha$-Lipschitz 函数,对于 $\rho_\infty$ 是可测函数,对于 $\rho_*$ 是等距多倍函数。运输依赖性可以通过经验插入方法一致地估计,但是提出了具有相同收敛速率但计算成本显著降低的替代估计器。数值结果表明,$\tau$ 可以稳健地恢复具有不同内部度量结构的数据集之间的依赖关系。在一个癌症研究数据集上阐明了基于运输依赖性的独立性检验等推理任务的用途。