Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies. The idea of ensemble learning is to assemble diverse models or multiple predictions and, thus, boost prediction performance. However, it is still an open question to what extent as well as which ensemble learning strategies are beneficial in deep learning based medical image classification pipelines. In this work, we proposed a reproducible medical image classification pipeline for analyzing the performance impact of the following ensemble learning techniques: Augmenting, Stacking, and Bagging. The pipeline consists of state-of-the-art preprocessing and image augmentation methods as well as 9 deep convolution neural network architectures. It was applied on four popular medical imaging datasets with varying complexity. Furthermore, 12 pooling functions for combining multiple predictions were analyzed, ranging from simple statistical functions like unweighted averaging up to more complex learning-based functions like support vector machines. Our results revealed that Stacking achieved the largest performance gain of up to 13% F1-score increase. Augmenting showed consistent improvement capabilities by up to 4% and is also applicable to single model based pipelines. Cross-validation based Bagging demonstrated to be the most complex ensemble learning method, which resulted in an F1-score decrease in all analyzed datasets (up to -10%). Furthermore, we demonstrated that simple statistical pooling functions are equal or often even better than more complex pooling functions. We concluded that the integration of Stacking and Augmentation ensemble learning techniques is a powerful method for any medical image classification pipeline to improve robustness and boost performance.
翻译:高级和高性能医学图像分类管道正在大量利用全套学习策略。 混合学习的理念是收集多种模型或多种预测,从而提高预测性能。 然而,它仍然是一个有待解决的问题,即共同学习战略在深度学习医学图像分类管道中究竟在多大程度上以及哪些内容都有益于深层次学习医学图像分类管道。 在这项工作中,我们提议了一种可复制的医疗图像分类管道,用于分析下列全套学习技巧的性能影响: 升级、 堆叠和滚动。 管道包括最先进的预处理和图像增强方法以及9种深层混凝土神经网络结构。它被应用到复杂程度不同的四个流行医学成像数据集中。 此外,对合并多种预测的12个集合功能进行了分析,从简单的统计功能,如未加权平均到更复杂的学习功能,如支持矢量机器。 我们的结果表明,堆叠工作达到了最简单的业绩10 升至13 % F1 的分类增加了。 强化显示, 不断改进的能力, 提升到4 % 和最强的精细的网络神经功能, 也显示一个更精细的精细的模型 。