评价在噪音吵闹的毒品筛查数据中协会测试的统计方法 (Evaluation of statistical approaches for association testing in noisy drug screening data)

Petr Smirnov,Ian Smith,Zhaleh Safikhani,Wail Ba-alawi,Farnoosh Khodakarami,Eva Lin,Yihong Yu,Scott Martin,Janosch Ortmann,Tero Aittokallio,Marc Hafner,Benjamin Haibe-Kains

dentifying associations among biological variables is a major challenge in modern quantitative biological research, particularly given the systemic and statistical noise endemic to biological systems. Drug sensitivity data has proven to be a particularly challenging field for identifying associations to inform patient treatment. To address this, we introduce two semi-parametric variations on the commonly used concordance index: the robust concordance index and the kernelized concordance index (rCI, kCI), which incorporate measurements about the noise distribution from the data. We demonstrate that common statistical tests applied to the concordance index and its variations fail to control for false positives, and introduce efficient implementations to compute p-values using adaptive permutation testing. We then evaluate the statistical power of these coefficients under simulation and compare with Pearson and Spearman correlation coefficients. Finally, we evaluate the various statistics in matching drugs across pharmacogenomic datasets. We observe that the rCI and kCI are better powered than the concordance index in simulation and show some improvement on real data. Surprisingly, we observe that the Pearson correlation was the most robust to measurement noise among the different metrics.

翻译：查明生物变量之间的关联是现代定量生物研究的一大挑战,特别是考虑到生物系统特有的系统和统计噪音,药物敏感数据已证明是确定病人治疗咨询协会的一个特别具有挑战性的领域。为了解决这个问题,我们在常用的一致指数上引入了两种半参数变量:强力协调指数和内脏协调指数(rCI, kCI),该指数包含数据噪音分布的测量数据。我们证明,适用于和谐指数的共同统计测试及其变异无法控制虚假阳性,并引入高效的实施方法,用适应性变异测试来计算p值。我们随后评估模拟中的这些系数的统计能力,并与皮尔森和斯帕尔曼相关系数进行比较。最后,我们评估了各种在将药物与药用基因组数据集相匹配方面的统计数据。我们观察到,在模拟中,rCI和KCI的功率比协调指数要强,并显示实际数据的改进。奇怪的是,我们发现,Pearson的关联性是不同测量标准之间测量噪音的最强。