Large medical imaging data sets are becoming increasingly available. A common challenge in these data sets is to ensure that each sample meets minimum quality requirements devoid of significant artefacts. Despite a wide range of existing automatic methods having been developed to identify imperfections and artefacts in medical imaging, they mostly rely on data-hungry methods. In particular, the lack of sufficient scans with artefacts available for training has created a barrier in designing and deploying machine learning in clinical research. To tackle this problem, we propose a novel framework having four main components: (1) a set of artefact generators inspired by magnetic resonance physics to corrupt brain MRI scans and augment a training dataset, (2) a set of abstract and engineered features to represent images compactly, (3) a feature selection process that depends on the class of artefact to improve classification performance, and (4) a set of Support Vector Machine (SVM) classifiers trained to identify artefacts. Our novel contributions are threefold: first, we use the novel physics-based artefact generators to generate synthetic brain MRI scans with controlled artefacts as a data augmentation technique. This will avoid the labour-intensive collection and labelling process of scans with rare artefacts. Second, we propose a large pool of abstract and engineered image features developed to identify 9 different artefacts for structural MRI. Finally, we use an artefact-based feature selection block that, for each class of artefacts, finds the set of features that provide the best classification performance. We performed validation experiments on a large data set of scans with artificially-generated artefacts, and in a multiple sclerosis clinical trial where real artefacts were identified by experts, showing that the proposed pipeline outperforms traditional methods.
翻译:大量医疗成像数据集正在逐渐获得。这些数据组的一个共同挑战是确保每样样本都达到最低质量要求,而没有重要的人工制品。尽管已经开发了广泛的现有自动方法,以查明医学成像中的不完善和人工制品,但它们大多依赖数据饥饿的方法。特别是,缺乏足够的用于培训的人工制品扫描为设计和部署临床研究的机器学习制造障碍。为了解决这一问题,我们提议了一个新框架,它有四个主要组成部分:(1) 一组由磁再感应物理学启发的人工智能生成器,用于腐蚀大脑MRI扫描和增强培训数据集;(2) 一套抽象和工程的功能,用于缩略释图像;(3) 特征选择过程取决于艺术的类别,以提高分类性能;(4) 一套支持性消毒机(SVM)的分类方法,用来识别人工制品。为了解决这个问题,我们首先使用新型基于物理的人工制品生成器,用受控的人工制品进行合成脑MRI扫描,作为数据增强技术。这将避免大量人工智能的扫描和设计过程,我们用精细的精细的精细的精细的精细的精细结构选择方法,我们用一个大型的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精选方法,然后用精细的精选方法,我们提出。