CRISPR genome engineering and single-cell RNA sequencing have transformed biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present substantial statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens -- "thresholded regression" -- exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV ("GLM-based errors-in-variables"), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across tens or hundreds of nodes on clouds (e.g., Microsoft Azure) and high-performance clusters. Leveraging this infrastructure, we apply GLM-EIV to analyze two recent, large-scale, single-cell CRISPR screen datasets, demonstrating improved performance in challenging problem settings.
翻译:CRISPR基因组工程和单细胞RNA测序的CRISPR基因组工程和单细胞RNA测序改变了生物发现;单细胞CRISPR屏幕将这两种技术结合在一起,将单细胞的基因扰动与基因表达方式的变化和致光调控网络基础疾病的变化联系起来。尽管有希望,单细胞CRISPR屏幕提出了巨大的统计挑战。我们通过理论和真实的数据分析表明,单细胞CRISPR屏幕中的一种标准估算和推断方法 -- -- " 临界回归 " -- -- 显示了减弱的偏向和偏差偏差权衡,作为内在的、具有挑战性的至选择调调参数的函数。为了克服这些困难,我们引入了GLM-EIV(“GLM-基于误差-易变”),这是单细胞CRISPR屏幕分析的新方法。GLM-EIV将经典误差模型扩展为反应和噪音预测器,它们具有指数式的家庭分解作用,并可能受到同一组变异变量的影响。我们开发一个计算基础设施来将GLM-EIV在高层次或高层次的G-A-C-CRIS-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-SAL-C-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SAL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-A-A-A-A-A-A-A-A-A-A-A-A-A-C-C-C-C-C-C-A-A-A-C-C-C-C-A-A-A-A-C-C-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-