We propose a coefficient of conditional dependence between two random variables $Y$ and $Z$ given a set of other variables $X_1,\ldots,X_p$, based on an i.i.d. sample. The coefficient has a long list of desirable properties, the most important of which is that under absolutely no distributional assumptions, it converges to a limit in $[0,1]$, where the limit is $0$ if and only if $Y$ and $Z$ are conditionally independent given $X_1,\ldots,X_p$, and is $1$ if and only if $Y$ is equal to a measurable function of $Z$ given $X_1,\ldots,X_p$. Moreover, it has a natural interpretation as a nonlinear generalization of the familiar partial $R^2$ statistic for measuring conditional dependence by regression. Using this statistic, we devise a new variable selection algorithm, called Feature Ordering by Conditional Independence (FOCI), which is model-free, has no tuning parameters, and is provably consistent under sparsity assumptions. A number of applications to synthetic and real datasets are worked out.
翻译:我们根据i.d.抽样,提出两个随机变量(美元)和Z美元之间的有条件依赖系数。该系数有很长的可取属性清单,其中最重要的是,在绝对没有分配假设的情况下,该系数会达到$0,1美元的限制,其中限额为$0,只有美元和Z美元有条件独立,以X_1,\ldots,X_p美元为条件,只有美元等于Z$的可测量函数(X_1,\ldots,X_p$),才为$1。此外,该系数自然解释为非线性地概括了人们熟悉的用回归衡量有条件依赖程度的部分$R%2的统计。使用这一统计,我们设计了新的变量选择算法,称为条件独立性调整法(FOCI),该算法是没有模型的,没有调整参数,并且根据宽度假设是可测量的。一些应用软件是合成的和真实数据集成的。