Testing hypothesis of independence between two random elements on a joint alphabet is a fundamental exercise in statistics. Pearson's chi-squared test is an effective test for such a situation when the contingency table is relatively small. General statistical tools are lacking when the contingency data tables are large or sparse. A test based on generalized mutual information is derived and proposed in this article. The new test has two desired theoretical properties. First, the test statistic is asymptotically normal under the hypothesis of independence; consequently it does not require the knowledge of the row and column sizes of the contingency table. Second, the test is consistent and therefore it would detect any form of dependence structure in the general alternative space given a sufficiently large sample. In addition, simulation studies show that the proposed test converges faster than Pearson's chi-squared test when the contingency table is large or sparse.
翻译:联合字母上两个随机要素之间的独立测试假设是统计中的一项基本工作。Pearson的奇差测试是当应急表相对小时对这种情况的有效测试。当应急数据表大或稀少时,缺少一般的统计工具。根据普遍相互信息得出的测试在本条中提出。新测试有两个理想的理论属性。首先,独立假设下的测试统计数据是非正常的;因此不需要了解应急表的行和列大小。第二,测试是一致的,因此,如果抽样足够大,它将在一般替代空间探测出任何形式的依赖结构。此外,模拟研究表明,在应急表大或稀少时,拟议的测试比Pearson的奇差测试要快。