Chatterjee's rank correlation coefficient $\xi_n$ is an empirical index for detecting functional dependencies between two variables $X$ and $Y$. It is an estimator for a theoretical quantity $\xi$ that is zero for independence and one if $Y$ is a measurable function of $X$. Based on an equivalent characterization of sorted numbers, we derive an upper bound for $\xi_n$ and suggest a simple normalization aimed at reducing its bias for small sample size $n$. In Monte Carlo simulations of various models, the normalization reduced the bias in all cases. The mean squared error was reduced, too, for values of $\xi$ greater than about 0.4. Moreover, we observed that non-parametric confidence intervals for $\xi$ based on bootstrapping $\xi_n$ in the usual n-out-of-n way have a coverage probability close to zero. This is remedied by an m-out-of-n bootstrap without replacement in combination with our normalization method.
翻译:暂无翻译