Accurate morphological classification of white blood cells (WBCs) is an important step in the diagnosis of leukemia, a disease in which nonfunctional blast cells accumulate in the bone marrow. Recently, deep convolutional neural networks (CNNs) have been successfully used to classify leukocytes by training them on single-cell images from a specific domain. Most CNN models assume that the distributions of the training and test data are similar, i.e., the data are independently and identically distributed. Therefore, they are not robust to different staining procedures, magnifications, resolutions, scanners, or imaging protocols, as well as variations in clinical centers or patient cohorts. In addition, domain-specific data imbalances affect the generalization performance of classifiers. Here, we train a robust CNN for WBC classification by addressing cross-domain data imbalance and domain shifts. To this end, we use two loss functions and demonstrate their effectiveness in out-of-distribution (OOD) generalization. Our approach achieves the best F1 macro score compared to other existing methods and is able to consider rare cell types. This is the first demonstration of imbalanced domain generalization in hematological cytomorphology and paves the way for robust single cell classification methods for the application in laboratories and clinics.
翻译:白细胞的准确形态分类是白血病诊断的重要步骤,而非功能性幼稚细胞在骨髓中积累的白血病是一种疾病。最近,使用深度卷积神经网络(CNN),通过使用特定领域的单个细胞图像进行训练,已成功用于分类白细胞。大多数CNN模型都假定训练和测试数据的分布类似,即数据是独立且同分布的。因此,它们对不同染色,放大倍数,分辨率,扫描仪或成像协议以及临床中心或患者群体的变化以及领域特定的数据不平衡等不具有鲁棒性。此外,领域特定数据的不平衡会影响分类器的泛化性能。在这里,我们通过解决跨域数据不平衡和域移位,训练了一种强健的CNN进行白细胞分类。为此,我们使用了两种损失函数,并证明了它们在域外分布(OOD)泛化方面的有效性。我们的方法相对于其他现有方法取得了最佳的F1宏得分,并能够考虑罕见的细胞类型。这是在血液形态学单个细胞分类中首次展示了不平衡域的一般化,并为实验室和诊所的强健单个细胞分类方法铺平了道路。