Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performance compared to traditional machine learning methods. In this work, we give a comprehensive survey on deep learning in single-cell analysis. We first introduce background on single-cell technologies and their development, as well as fundamental concepts of deep learning including the most popular deep architectures. We present an overview of the single-cell analytic pipeline pursued in research applications while noting divergences due to data sources or specific applications. We then review seven popular tasks spanning through different stages of the single-cell analysis pipeline, including multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, and cell-type annotation. Under each task, we describe the most recent developments in classical and deep learning methods and discuss their advantages and disadvantages. Deep learning tools and benchmark datasets are also summarized for each task. Finally, we discuss the future directions and the most recent challenges. This survey will serve as a reference for biologists and computer scientists, encouraging collaborations.
翻译:单细胞技术使整个生物学领域发生革命性的变化。单细胞技术产生的大量数据是高维、稀少、多样和复杂的依赖结构,使得使用传统机器学习方法的分析具有挑战性和不切实际性。在应对这些挑战时,深层次学习往往显示优于传统机器学习方法的绩效。在这项工作中,我们对单细胞分析中的深层学习进行了全面调查。我们首先介绍单细胞技术及其发展的背景,以及深层学习的基本概念,包括最受欢迎的深层结构。我们概述了在研究应用中探索的单细胞分析管道,同时注意到数据源或具体应用的差异。然后我们审查贯穿于单细胞分析管道不同阶段的七大流行任务,包括多式集成、估算、组合、空间域识别、细胞型分解、细胞分解和细胞型注注。在每项任务中,我们介绍传统和深层学习方法的最新发展情况,并讨论其优劣之处。深层次学习工具和基准数据集也被总结为每项任务。最后,我们讨论未来两部科学家之间合作的最新方向和最新挑战。