In the realm of medical imaging, the training of machine learning models necessitates a large and varied training dataset to ensure robustness and interoperability. However, acquiring such diverse and heterogeneous data can be difficult due to the need for expert labeling of each image and privacy concerns associated with medical data. To circumvent these challenges, data augmentation has emerged as a promising and cost-effective technique for increasing the size and diversity of the training dataset. In this study, we provide a comprehensive review of the specific data augmentation techniques employed in medical imaging and explore their benefits. We conducted an in-depth study of all data augmentation techniques used in medical imaging, identifying 11 different purposes and collecting 65 distinct techniques. The techniques were operationalized into spatial transformation-based, color and contrast adjustment-based, noise-based, deformation-based, data mixing-based, filters and mask-based, division-based, multi-scale and multi-view-based, and meta-learning-based categories. We observed that some techniques require manual specification of all parameters, while others rely on automation to adjust the type and magnitude of augmentation based on task requirements. The utilization of these techniques enables the development of more robust models that can be applied in domains with limited or challenging data availability. It is expected that the list of available techniques will expand in the future, providing researchers with additional options to consider.
翻译:在医学成像领域,对机器学习模型的培训需要大量和多样的培训数据集,以确保稳健性和互操作性。然而,由于需要为与医疗数据有关的每种图像和隐私问题贴上专家标签,因此很难获得这种多样化的数据。为回避这些挑战,数据增强已成为增加培训数据集规模和多样性的一种有希望和成本效益的技术。在本研究中,我们全面审查了医疗成像中采用的具体数据增强技术,并探讨了这些技术的益处。我们深入研究了医疗成像中使用的所有数据增强技术,查明了11个不同的目的并收集了65种不同的技术。这些技术可应用于基于空间转换、基于颜色和对比的调整、基于噪音的、基于变形、基于数据、基于混合的、过滤器和基于面具的、基于分工的、基于多尺度的和基于多视角的和基于元学习的类别。我们发现,有些技术需要对所有参数进行手工规格,而另一些技术则依靠自动化来根据任务要求调整增扩能力的类型和规模。这些技术的利用使得能够开发更强有力的、基于颜色和差异的、基于噪音的、基于变化的、基于数据、基于不同结构、基于数据的、基于数据、基于数据、基于筛选的、基于筛选的、基于数据、以过滤的、多尺度的、多尺度、基于数据过滤的、多尺度、多尺度和基于未来的数据的模型的模型的模型的模型的模型的模型的模型的应用。我们可以考虑更多的选择。</s>