Accurate morphological classification of white blood cells (WBCs) is an important step in the diagnosis of leukemia, a disease in which nonfunctional blast cells accumulate in the bone marrow. Recently, deep convolutional neural networks (CNNs) have been successfully used to classify leukocytes by training them on single-cell images from a specific domain. Most CNN models assume that the distributions of the training and test data are similar, i.e., the data are independently and identically distributed. Therefore, they are not robust to different staining procedures, magnifications, resolutions, scanners, or imaging protocols, as well as variations in clinical centers or patient cohorts. In addition, domain-specific data imbalances affect the generalization performance of classifiers. Here, we train a robust CNN for WBC classification by addressing cross-domain data imbalance and domain shifts. To this end, we use two loss functions and demonstrate their effectiveness in out-of-distribution (OOD) generalization. Our approach achieves the best F1 macro score compared to other existing methods and is able to consider rare cell types. This is the first demonstration of imbalanced domain generalization in hematological cytomorphology and paves the way for robust single cell classification methods for the application in laboratories and clinics.
翻译:精确的白细胞(WBCs)形态分类是白血病诊断的重要步骤,这种疾病中非功能性的原始细胞在骨髓中积累。最近,深度卷积神经网络(CNNs)已经成功用于对白细胞进行分类,通过对来自特定域的单细胞图像进行训练。大多数CNN模型认为训练和测试数据分布相似,即数据是独立且同分布的。因此,它们对不同染色程序、放大倍数、分辨率、扫描仪或成像协议以及临床中心或患者群体的变化不具有鲁棒性。此外,特定领域数据的失衡也会影响分类器的泛化性能。在本研究中,我们通过解决跨域数据不平衡和域偏移问题来训练鲁棒的CNN以进行WBC分类。为此,我们使用了两个损失函数,并展示了它们在超出分布(OOD)泛化方面的有效性。我们的方法实现了比其他现有方法更好的F1宏分数,并能够考虑罕见的细胞类型。这是在血液细胞形态学领域中首次展示基于数据失衡的领域泛化,并为实验室和临床应用提供了鲁棒的单细胞分类方法的基础。