We present the $U$-Statistic Permutation (USP) test of independence in the context of discrete data displayed in a contingency table. Either Pearson's chi-squared test of independence, or the $G$-test, are typically used for this task, but we argue that these tests have serious deficiencies, both in terms of their inability to control the size of the test, and their power properties. By contrast, the USP test is guaranteed to control the size of the test at the nominal level for all sample sizes, has no issues with small (or zero) cell counts, and is able to detect distributions that violate independence in only a minimal way. The test statistic is derived from a $U$-statistic estimator of a natural population measure of dependence, and we prove that this is the unique minimum variance unbiased estimator of this population quantity. The practical utility of the USP test is demonstrated on both simulated data, where its power can be dramatically greater than those of Pearson's test and the $G$-test, and on real data. The USP test is implemented in the R package USP.
翻译:我们在一个应急表中显示的离散数据中提出了美元-统计变异(USP)独立测试。 要么Pearson的“奇夸度”独立测试,要么G$测试,通常用于这项任务,但我们争辩说,这些测试存在严重缺陷,既在于它们无法控制测试大小,也在于其功率特性。相比之下,USP测试保证在所有样本大小的名义水平上控制测试的大小,没有小(或零)细胞计数的问题,并且能够以最起码的方式探测到违反独立的分布。测试统计数据来自美元统计的自然人口依赖度估计值,我们证明这是这一人口数量的唯一最低差异不偏差估计值。 USP测试的实际用途在两个模拟数据上都得到了证明,其功率可能大大高于Pearson测试和美元测试的功率,在真实数据上也得到了证明。 USP测试是在R 包 USP 中实施的。