Classical tests for a difference in means control the type I error rate when the groups are defined a priori. However, when the groups are instead defined via clustering, then applying a classical test yields an extremely inflated type I error rate. Notably, this problem persists even if two separate and independent data sets are used to define the groups and to test for a difference in their means. To address this problem, in this paper, we propose a selective inference approach to test for a difference in means between two clusters. Our procedure controls the selective type I error rate by accounting for the fact that the choice of null hypothesis was made based on the data. We describe how to efficiently compute exact p-values for clusters obtained using agglomerative hierarchical clustering with many commonly-used linkages. We apply our method to simulated data and to single-cell RNA-sequencing data.
翻译:对不同组别进行古典测试,意味着在先验地界定组别时控制I型错误率。然而,当这些组别通过群集来定义时,如果采用古典测试,则会产生一种极为膨胀的I型错误率。值得注意的是,即使使用两个独立独立的数据集来界定组别并测试其手段的差异,这个问题仍然存在。为了解决这个问题,我们在本文件中建议采用选择性推论方法来测试两个组群之间在手段上的差异。我们的程序控制了选择性I型错误率,方法是考虑到根据数据选择无效假设的事实。我们描述了如何有效地计算使用多种常用链接的聚居层获得的群的准确的p值。我们用我们的方法模拟数据和单细胞 RNA 序列数据。