Significance testing -- especially the paired-permutation test -- has played a vital role in developing NLP systems to provide confidence that the difference in performance between two systems (i.e., the test statistic) is not due to luck. However, practitioners rely on Monte Carlo approximation to perform this test due to a lack of a suitable exact algorithm. In this paper, we provide an efficient exact algorithm for the paired-permutation test for a family of structured test statistics. Our algorithm runs in $\mathcal{O}(GN (\log GN )(\log N ))$ time where $N$ is the dataset size and $G$ is the range of the test statistic. We found that our exact algorithm was $10$x faster than the Monte Carlo approximation with $20000$ samples on a common dataset.
翻译:质量测试 -- -- 特别是配对变异测试 -- -- 在开发NLP系统以使人们相信两个系统(即测试统计)的性能差异不是运气造成的,因而在建立NLP系统方面发挥了至关重要的作用。然而,由于缺乏合适的精确算法,从业人员依赖蒙特卡洛近似值来进行这项测试。在本文中,我们为一组结构化测试统计数据的组合的配对变异测试提供了高效的精确算法。我们的算法以$\mathcal{O}(GN (log GN)(\log N))) 运行,用美元运行,用美元计算数据集大小为N$,用$G$作为测试统计数据的范围。我们发现,我们的精确算法比蒙特卡洛近似值高出1 000美元,共同数据集的样本为2,000美元。