This paper introduces a new property of estimators of the strength of statistical association, which helps characterize how well an estimator will perform in scenarios where dependencies between continuous and discrete random variables need to be rank ordered. The new property, termed the estimator response curve, is easily computable and provides a marginal distribution agnostic way to assess an estimator's performance. It overcomes notable drawbacks of current metrics of assessment, including statistical power, bias, and consistency. We utilize the estimator response curve to test various measures of the strength of association that satisfy the data processing inequality (DPI), and show that the CIM estimator's performance compares favorably to kNN, vME, AP, and H_{MI} estimators of mutual information. The estimators which were identified to be suboptimal, according to the estimator response curve, perform worse than the more optimal estimators when tested with real-world data from four different areas of science, all with varying dimensionalities and sizes.
翻译:本文引入了统计协会实力估计者的新属性, 这有助于确定测量者在连续和离散随机变量之间需要排序的依存性的情况下, 如何很好地发挥作用。 新的属性, 称为估测者响应曲线, 很容易计算, 提供了一种边际分布的不可知性方法来评估估测者的表现。 它克服了当前评估指标的显著缺陷, 包括统计力量、 偏差和一致性。 我们使用估测者反应曲线来测试满足数据处理不平等( DPI) 各种关联性强度的计量, 并显示CIM 估测者的表现优于 kNN、 vME、 AP 和 H ⁇ MI} 相互信息估计者。 根据估测者反应曲线, 被确定为次优的估者, 在用四个不同科学领域( 都具有不同维度和大小的现实世界数据测试时, 表现比最优的估测者差。