评价在噪音吵闹的毒品筛查数据中协会测试的统计方法 (Evaluation of statistical approaches for association testing in noisy drug screening data)

Petr Smirnov,Ian Smith,Zhaleh Safikhani,Wail Ba-alawi,Farnoosh Khodakarami,Eva Lin,Yihong Yu,Scott Martin,Janosch Ortmann,Tero Aittokallio,Marc Hafner,Benjamin Haibe-Kains

Identifying associations among biological variables is a major challenge in modern quantitative biological research, particularly given the systemic and statistical noise endemic to biological systems. Drug sensitivity data has proven to be a particularly challenging field for identifying associations to inform patient treatment. To address this, we introduce two semi-parametric variations on the commonly used Concordance Index: the robust Concordance Index and the kernelized Concordance Index (rCI, kCI), which incorporate measurements about the noise distribution from the data. We demonstrate that common statistical tests applied to the concordance index and its variations fail to control for false positives, and introduce efficient implementations to compute p-values using adaptive permutation testing. We then evaluate the statistical power of these coefficients under simulation and compare with Pearson and Spearman correlation coefficients. Finally, we evaluate the various statistics in matching drugs across pharmacogenomic datasets. We observe that the rCI and kCI are better powered than the concordance index in simulation and show some improvement on real data. Surprisingly, we observe that the Pearson correlation was the most robust to measurement noise among the different metrics.

翻译：确定生物变量之间的关联是现代定量生物研究的一大挑战,特别是考虑到生物系统特有的系统和统计噪音,毒品敏感数据已证明是确定病人治疗咨询协会的一个特别具有挑战性的领域。为了解决这个问题,我们在常用的和谐指数中引入了两种半参数变量:强力和谐指数和内嵌调和指数(rCI, kCI),其中纳入了数据噪音分布的测量数据。我们证明,适用于和谐指数的共同统计测试及其变异无法控制假阳性,并引入高效的实施方法,用适应性变异测试来计算p值。我们随后评估模拟中的这些系数的统计能力,并与Pearson和Spearman相关系数进行比较。最后,我们评估了各种药物在药用基因组数据集之间匹配的统计数据。我们观察到,在模拟中,rCI和kCI的功率优于和谐指数,并显示实际数据的改善。奇怪的是,我们发现,Pearson的关联性是不同计量标准之间测量噪音的最可靠。