Reliable measurement of dependence between variables is essential in many applications of statistics and machine learning. Current approaches for dependence estimation, especially density-based approaches, lack in precision, robustness and/or interpretability (in terms of the type of dependence being estimated). We propose a two-step approach for dependence quantification between random variables: 1) We first decompose the probability density functions (PDF) of the variables involved in terms of multiple local moments of uncertainty that systematically and precisely identify the different regions of the PDF (with special emphasis on the tail-regions). 2) We then compute an optimal transport map to measure the geometric similarity between the corresponding sets of decomposed local uncertainty moments of the variables. Dependence is then determined by the degree of one-to-one correspondence between the respective uncertainty moments of the variables in the optimal transport map. We utilize a recently introduced Gaussian reproducing kernel Hilbert space (RKHS) based framework for multi-moment uncertainty decomposition of the variables. Being based on the Gaussian RKHS, our approach is robust towards outliers and monotone transformations of data, while the multiple moments of uncertainty provide high resolution and interpretability of the type of dependence being quantified. We support these claims through some preliminary results using simulated data.
翻译:在统计和机器学习的许多应用中,必须可靠地测量变量之间的依赖性。目前的依赖性估计方法,特别是基于密度的方法,缺乏精确性、稳健性和/或可解释性(估计依赖性的类型)。我们提出在随机变量之间依赖性量化的两步方法:1) 我们首先分解在多个局部不确定时刻所涉变量的概率密度功能(PDFF),这些不确定性时刻系统、准确地确定PDF的不同区域(特别侧重于尾部区域)。2 我们随后计算出一个最佳运输图,以测量各套对应的变数不相容的本地不确定时刻之间的几何相似性。然后,根据最佳运输图中各变数的不确定时刻之间的一对一对应程度来确定依赖性。我们利用最近推出的高斯再生产内尔伯特空间(RKHS)框架来系统多移动性不确定性解析。基于Gaussian RKHS,我们的方法非常稳健地对数据外端和单项变换。我们用多种不确定性模型来解释数据依赖性。