Genomic data are subject to various sources of confounding, such as batch effects and cell mixtures. To identify genomic features associated with a variable of interest in the presence of confounders, the traditional approach involves fitting a confounder-adjusted regression model to each genomic feature, followed by multiplicity correction. Previously, we showed that this procedure was sub-optimal and proposed a more powerful procedure named the two-dimensional false discovery rate control (2dFDR) procedure, which relied on the test statistics from both confounder-adjusted and unadjusted linear regression models (Yi et al (2021)). Though 2dFDR provides significant power improvement over the traditional method, it is based on the linear model assumption that may be too restrictive for some practical settings. This study proposes a model-free two-dimensional false discovery rate control procedure (MF-2dFDR) to significantly broaden the scope and applicability of 2dFDR. MF-2dFDR uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. MF-2dFDR provides (asymptotically) valid inference from samples in settings in which the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown. To achieve this goal, our method requires the conditional distribution of the covariate given the confounders to be known or can be estimated from the data. We develop a conditional randomization procedure to simultaneously select the two cutoff values for the marginal and conditional independence test statistics. Promising finite sample performance is demonstrated via extensive simulations and real data applications.
翻译:基因组数据取决于各种混杂来源,例如批量效应和细胞混合物。为了确定与在迷你者在场时感兴趣的变数有关的基因组特征,传统方法需要将一个混杂调整回归模型安装到每个基因组特征上,然后进行多重校正。我们以前曾指出,这一程序是次优化的,并提议了一个更强有力的程序,称为二维虚假发现率控制(2dFDR),该程序依赖于来自混混调整和未调整线性回归模型(Yi等人(2021年))的测试统计数据。虽然2DFDR对传统方法提供了显著的动力改进,但它基于线性模型假设假设假设,对某些实际环境可能过于严格。本研究提议了一个无模型的二维假发现率控制程序(MF-2dFDR),以大幅扩大2DR的范围和适用性能。MF-2dFDR使用边际独立测试统计数据作为辅助信息,以过滤不太有希望的特征(Yi等人等人(2021年),而FDR控制基于其余特征的有条件独立测试统计数据。MF-2dFDR的模型假设模型假设模型假设,即通过直观数据流流流流流流流流流数据, 和直径解的测测测测测测测测测测测测测测测数据,其为正的测测测测测测测测数据,这是我们测测测测测测测测测的测的测的测的测的模型。