Purpose: To investigate aspects of the validation of self-supervised algorithms for reconstruction of undersampled MR images: quantitative evaluation of prospective reconstructions, potential differences between prospective and retrospective reconstructions, suitability of commonly used quantitative metrics, and generalizability. Theory and Methods: Two self-supervised algorithms based on self-supervised denoising and neural network image priors were investigated. These methods are compared to a least squares fitting and a compressed sensing reconstruction using in-vivo and phantom data. Their generalizability was tested with prospectively under-sampled data from experimental conditions different to the training. Results: Prospective reconstructions can exhibit significant distortion relative to retrospective reconstructions/ground truth. Pixel-wise quantitative metrics may not capture differences in perceptual quality accurately, in contrast to a perceptual metric. All methods showed potential for generalization; generalizability is more affected by changes in anatomy/contrast than other changes. No-reference image metrics correspond well with human rating of image quality for studying generalizability. Compressed Sensing and learned denoising perform similarly well on all data. Conclusion: Self-supervised methods show promising results for accelerating image reconstruction in clinical routines. Nonetheless, more work is required to investigate standardized methods to validate reconstruction algorithms for future clinical use.
翻译:目的:调查自我监督算法的验证方面,以重建未得到充分采样的MR图像:对未来重建进行定量评估,预期和追溯重建之间可能存在差异,预期和追溯重建可能存在差异,通用量化指标是否合适,以及是否具有一般性。理论和方法:调查了基于自我监督的拆除和神经网络图像前期的两种自我监督算法。这些方法与利用动态和幻影数据进行最不完善的方格和压缩遥感重建相比,比较了这些方法;其一般性测试与培训不同实验条件下的预期抽样数据相比,测试了一般性。结果:前景重建可能显示与追溯重建/地面真相相比出现重大扭曲。光学定量算法可能无法准确反映感知性质量差异,而所有方法都显示普遍化的可能性;一般性比其他变化更受解剖/反射变异性影响。无参照图像测量标准与人类图像质量评级相匹配,用于研究一般可采样性研究的通用性。测量和学习的临床重建可能出现重大扭曲性数据,在不断升级的临床重建中进行类似的标准化工作。