Medical images commonly exhibit multiple abnormalities. Predicting them requires multi-class classifiers whose training and desired reliable performance can be affected by a combination of factors, such as, dataset size, data source, distribution, and the loss function used to train the deep neural networks. Currently, the cross-entropy loss remains the de-facto loss function for training deep learning classifiers. This loss function, however, asserts equal learning from all classes, leading to a bias toward the majority class. In this work, we benchmark various state-of-the-art loss functions that are suitable for multi-class classification, critically analyze model performance, and propose improved loss functions. We select a pediatric chest X-ray (CXR) dataset that includes images with no abnormality (normal), and those exhibiting manifestations consistent with bacterial and viral pneumonia. We construct prediction-level and model-level ensembles, respectively, to improve classification performance. Our results show that compared to the individual models and the state-of-the-art literature, the weighted averaging of the predictions for top-3 and top-5 model-level ensembles delivered significantly superior classification performance (p < 0.05) in terms of MCC (0.9068, 95% confidence interval (0.8839, 0.9297)) metric. Finally, we performed localization studies to interpret model behaviors to visualize and confirm that the individual models and ensembles learned meaningful features and highlighted disease manifestations.
翻译:医学图像通常表现出多重异常。 预测这些图像需要多级分类人员, 其培训和预期的可靠性能可能受到多种因素的组合影响, 例如数据集大小、数据源、分布和用于深神经网络培训的损失功能。 目前, 交叉肾上腺损失仍然是培养深层学习分类人员所需的脱形损失功能。 然而, 这一损失功能表明所有类别都有同等的学习, 导致偏向多数类别。 在这项工作中, 我们为适合多级分类的各种最先进的损失功能设定基准, 严格分析模型性能, 并提出更好的损失功能。 我们选择了一个小儿胸X光( CXR) 数据集, 该数据集包含无异常(正常)的图像, 以及那些显示与细菌和病毒肺炎相符的症状。 我们分别构建了预测级别和模型级的组合, 从而改进了分类绩效。 我们的结果表明, 与单个模型和最新水平的文献显示, 用于多级分类、 严格分析模型性能, 并提议改进损失功能。 我们选择了前3级和上级5级X- 射线(CR) 射线(C- Ra) 1098) 的预测, 最后交付了95 的高级业绩分析(我们所了解的模型, 1098 和直观 的模型, 和直观 的直观(我们所研判) 的直观的直观) 的直观) 。