Recognizing the types of white blood cells (WBCs) in microscopic images of human blood smears is a fundamental task in the fields of pathology and hematology. Although previous studies have made significant contributions to the development of methods and datasets, few papers have investigated benchmarks or baselines that others can easily refer to. For instance, we observed notable variations in the reported accuracies of the same Convolutional Neural Network (CNN) model across different studies, yet no public implementation exists to reproduce these results. In this paper, we establish a benchmark for WBC recognition. Our results indicate that CNN-based models achieve high accuracy when trained and tested under similar imaging conditions. However, their performance drops significantly when tested under different conditions. Moreover, the ResNet classifier, which has been widely employed in previous work, exhibits an unreasonably poor generalization ability under domain shifts due to batch normalization. We investigate this issue and suggest some alternative normalization techniques that can mitigate it. We make fully-reproducible code publicly available\footnote{\url{https://github.com/apple2373/wbc-benchmark}}.
翻译:在人类血液涂片的显微镜中,承认白血球类型(WBCs)是病理学和血液学领域的一项基本任务。虽然以前的研究对方法和数据集的发展作出了重大贡献,但很少有文件调查了其他文件可以轻易参考的基准或基线。例如,我们观察到所报告的革命神经网络(CNN)模型在不同的研究中存在明显的差异,但是没有公开实施来复制这些结果。我们在本文件中为WBC的确认建立了一个基准。我们的结果表明,基于CNN的模型在类似的成像条件下经过培训和测试后,其性能达到很高的精确度。然而,在不同的条件下进行测试时,其性能显著下降。此外,在以前的工作中广泛使用的ResNet分类显示,由于批次正常化,在域转换过程中普遍能力不合理地差。我们调查这一问题,并提出一些可以减轻其影响的其他常规化技术。我们公开提供完全可追溯的代码\url{http://github.com/apple2373/wb-benchnchmark>。</s>