Differential abundance tests in compositional data are essential and fundamental tasks in various biomedical applications, such as single-cell, bulk RNA-seq, and microbiome data analysis. However, despite the recent developments in these fields, differential abundance analysis in compositional data remains a complicated and unsolved statistical problem, because of the compositional constraint and prevalent zero counts in the dataset. This study introduces a new differential abundance test, the robust differential abundance (RDB) test, to address these challenges. Compared with existing methods, the RDB test 1) is simple and computationally efficient, 2) is robust to prevalent zero counts in compositional datasets, 3) can take the data's compositional nature into account, and 4) has a theoretical guarantee of controlling false discoveries in a general setting. Furthermore, in the presence of observed covariates, the RDB test can work with the covariate balancing techniques to remove the potential confounding effects and draw reliable conclusions. Finally, we apply the new test to several numerical examples using simulated and real datasets to demonstrate its practical merits.
翻译:然而,尽管这些领域最近有了发展,但组成数据中的不同丰度分析仍是一个复杂和未解决的统计问题,因为数据集中存在组成限制和普遍零计现象。本研究为应对这些挑战引入了新的差异丰度测试,即强力差异丰度测试(RDB),与现有方法相比,RDB测试1(RDB测试1)简单且具有计算效率,2)对于组成数据集中普遍存在的零计十分有力,3)可以将数据的组成性质考虑在内,4)在理论上保证在一般情况下控制虚假发现。此外,在观察到的共差的情况下,RDB测试可以与共变平衡技术合作,消除潜在的粘结效应并得出可靠的结论。最后,我们用模拟和真实数据集对几个数字实例进行新的测试,以证明其实际优点。