在乳房病理学图像层次分类中应用转移学习和集合学习 (Application of Transfer Learning and Ensemble Learning in Image-level Classification for Breast Histopathology)

Background: Breast cancer has the highest prevalence in women globally. The classification and diagnosis of breast cancer and its histopathological images have always been a hot spot of clinical concern. In Computer-Aided Diagnosis (CAD), traditional classification models mostly use a single network to extract features, which has significant limitations. On the other hand, many networks are trained and optimized on patient-level datasets, ignoring the application of lower-level data labels. Method: This paper proposes a deep ensemble model based on image-level labels for the binary classification of benign and malignant lesions of breast histopathological images. First, the BreakHis dataset is randomly divided into a training, validation and test set. Then, data augmentation techniques are used to balance the number of benign and malignant samples. Thirdly, considering the performance of transfer learning and the complementarity between each network, VGG-16, Xception, Resnet-50, DenseNet-201 are selected as the base classifiers. Result: In the ensemble network model with accuracy as the weight, the image-level binary classification achieves an accuracy of $98.90\%$. In order to verify the capabilities of our method, the latest Transformer and Multilayer Perception (MLP) models have been experimentally compared on the same dataset. Our model wins with a $5\%-20\%$ advantage, emphasizing the ensemble model's far-reaching significance in classification tasks. Conclusion: This research focuses on improving the model's classification performance with an ensemble algorithm. Transfer learning plays an essential role in small datasets, improving training speed and accuracy. Our model has outperformed many existing approaches in accuracy, providing a method for the field of auxiliary medical diagnosis.

翻译：在计算机辅助诊断(CAD)中,传统分类模型大多使用单一网络来提取特征,这些特征有相当大的局限性。另一方面,许多网络在患者一级数据集上接受培训和优化,忽视了较低级别数据标签的应用。方法:本文件提议了一个基于乳癌及其病理学图象学图像分类的图像等级标签的深层共变模型。首先,BreadHis数据交换数据集随机地分成一个培训、验证和测试组。然后,数据扩充技术用于平衡良性和恶性样本的数量。第三,考虑到转移学习的绩效和每个网络之间的互补性,VGG-16,Xception,Resnet-50,DenseNet-201,选择了基础分类。结果:在具有精度的乳腺病理和恶性肿瘤图象学分类的混合模型中,BreadHis 数据转换数据集随机地分成一个精度的精度。在不断改进的医学分析中,在不断改进的模型中,在不断更新的模型中,在不断改进的模型中,在不断改进我们的数据分析中,不断改进的精确性能。