We propose a direct and natural extension of Azadkia & Chatterjee's rank correlation $T$ introduced in [4] to a set of $q \geq 1$ endogenous variables. The approach builds upon converting the original vector-valued problem into a univariate problem and then applying the rank correlation $T$ to it. The novel measure $T^q$ then quantifies the scale-invariant extent of functional dependence of an endogenous vector ${\bf Y} = (Y_1,\dots,Y_q)$ on a number of exogenous variables ${\bf X} = (X_1,\dots,X_p)$, $p\geq1$, characterizes independence of ${\bf X}$ and ${\bf Y}$ as well as perfect dependence of ${\bf Y}$ on ${\bf X}$ and hence fulfills all the desired characteristics of a measure of predictability. Aiming at maximum interpretability, we provide various general invariance and continuity conditions for $T^q$ as well as novel ordering results for conditional distributions, revealing new insights into the nature of $T$. Building upon the graph-based estimator for $T$ in [4], we present a non-parametric estimator for $T^q$ that is strongly consistent in full generality, i.e., without any distributional assumptions. Based on this estimator we develop a model-free and dependence-based feature ranking and forward feature selection of multiple-outcome data, and establish tools for identifying networks between random variables. Real case studies illustrate the main aspects of the developed methodology.
翻译:Azadkia & Chatterjee 的排名直线和自然延伸 Azadkia & Chartterjee 的排名关系 $T$ 在 [4] 中引入, 以一组美元=(x_1,\dosts,X_p_) 的内生变量 = (美元) 为一组美元= (Y_1,\dots,Y_q) 的多元变量, 直接和自然延伸 Azadkia & Chartterejee 的等级关系 $T$, 以一套美元=(x_1,dosts,Xp_) 的内生变量为一组。 这种方法基于原始矢量的问题, 将原始矢量的问题转换成一个单数问题, 然后对它应用该值相对值 $T$(美元) 的正值。 新的量值和连续性条件, 以美元为基底值 $(美元) 的内生变数(xxxxxxxxxxxxxxxxxxxxxx) 模型 =xal deal- deal- delistestal deal destal destal deviewal deal deal disal disal dism maxxxxx.</s>