Deep learning methods have become the state of the art for undersampled MR reconstruction. Particularly for cases where it is infeasible or impossible for ground truth, fully sampled data to be acquired, self-supervised machine learning methods for reconstruction are becoming increasingly used. However potential issues in the validation of such methods, as well as their generalizability, remain underexplored. In this paper, we investigate important aspects of the validation of self-supervised algorithms for reconstruction of undersampled MR images: quantitative evaluation of prospective reconstructions, potential differences between prospective and retrospective reconstructions, suitability of commonly used quantitative metrics, and generalizability. Two self-supervised algorithms based on self-supervised denoising and the deep image prior were investigated. These methods are compared to a least squares fitting and a compressed sensing reconstruction using in-vivo and phantom data. Their generalizability was tested with prospectively under-sampled data from experimental conditions different to the training. We show that prospective reconstructions can exhibit significant distortion relative to retrospective reconstructions/ground truth. Furthermore, pixel-wise quantitative metrics may not capture differences in perceptual quality accurately, in contrast to a perceptual metric. In addition, all methods showed potential for generalization; however, generalizability is more affected by changes in anatomy/contrast than other changes. We further showed that no-reference image metrics correspond well with human rating of image quality for studying generalizability. Finally, we showed that a well-tuned compressed sensing reconstruction and learned denoising perform similarly on all data.
翻译:深层次的学习方法已经成为了未得到充分采样的MR重建的最先进的方法。 特别是对于无法或不可能获得地面真相的情况,将获得充分抽样的数据,正在越来越多地使用自我监督的机器重建学习方法。然而,在验证这些方法及其一般性方面的潜在问题仍然没有得到充分探讨。在本文件中,我们调查了为重建未得到充分采样的MR图像而验证自我监督的算法的重要方面:对未来重建的定量评估、未来和追溯性重建之间可能存在的差异、通用量化指标的适宜性和普遍性。基于自我监督的消音和以前深层图像的两种自我监督的算法正在越来越多地被使用。这些方法与最不合适的平方和采用静态和幻影数据进行压缩的重建相比,其一般可接受性测试的是来自与培训不同的实验性条件的预期性数据。我们表明,所有未来的重建都可能与追溯性重建/地面真相相比出现显著的扭曲性,通常使用的定量指标的适合性和可变性。此外,对一般的定量性衡量方法可能显示我们在一般的准确性数据上的可变性。