High-throughput sequencing technology allows us to test the compositional difference of bacteria in different populations. One important feature of human microbiome data is that it often includes a large number of zeros. Such data can be treated as being generated from a two-part model that includes a zero point-mass. Motivated by analysis of such non-negative data with excessive zeros, we introduce several truncated rank-based two-group and multi-group tests for such data, including a truncated rank-based Wilcoxon rank-sum test for two-group comparison and two truncated Kruskal-Wallis tests for multi-group comparison. We show both analytically through asymptotic relative efficiency analysis and by simulations that the proposed tests have higher power than the standard rank-based tests, especially when the proportion of zeros in the data is high. The tests can also be applied to repeated measurements of compositional data via simple within-subject permutations. We apply the tests to the analysis of a gut microbiome data set to compare the microbiome compositions of healthy and pediatric Crohn's disease patients and to assess the treatment effects on microbiome compositions. We identify several bacterial genera that are missed by the standard rank-based tests.
翻译:高通量测序技术允许我们测试不同人群中细菌的构成差异。 人类微生物数据的一个重要特征是,它通常包含大量零。 这些数据可以被视为由包含零点质量的两部分模型生成。 我们借助于对此类无负值数据的分析,引入了数种基于等级等级的两组和多组的此类数据测试,包括用于两组比较的短分级级级威尔科松级和两次用于多组比较的粗略Kruskal-Wallis测试。我们通过无线相对效率分析和模拟,从分析角度显示拟议测试的功率高于标准等级测试,特别是当数据中零的比例很高时。这些测试还可以用于通过简单的本位定位定位定位定位定位测算对此类数据进行重复的测量。 我们将测试应用于用于对基于两组的直线微生物数据进行的分析,以比较健康微生物构成的微生物和小行星级定位的微生物构成。 我们通过模拟显示,我们所测测测测测测的几种的克罗氏级的微生物病和误测算结果。