Automatic anatomical landmark localization has made great strides by leveraging deep learning methods in recent years. The ability to quantify the uncertainty of these predictions is a vital component needed for these methods to be adopted in clinical settings, where it is imperative that erroneous predictions are caught and corrected. We propose Quantile Binning, a data-driven method to categorize predictions by uncertainty with estimated error bounds. Our framework can be applied to any continuous uncertainty measure, allowing straightforward identification of the best subset of predictions with accompanying estimated error bounds. We facilitate easy comparison between uncertainty measures by constructing two evaluation metrics derived from Quantile Binning. We compare and contrast three epistemic uncertainty measures (two baselines, and a proposed method combining aspects of the two), derived from two heatmap-based landmark localization model paradigms (U-Net and patch-based). We show results across three datasets, including a publicly available Cephalometric dataset. We illustrate how filtering out gross mispredictions caught in our Quantile Bins significantly improves the proportion of predictions under an acceptable error threshold. Finally, we demonstrate that Quantile Binning remains effective on landmarks with high aleatoric uncertainty caused by inherent landmark ambiguity, and offer recommendations on which uncertainty measure to use and how to use it. The code and data are available at https://github.com/schobs/qbin.
翻译:近年来,利用深层学习方法,自动解剖标志性本地化取得了巨大进步。量化这些预测的不确定性的能力是临床环境中采用这些方法的关键组成部分,在临床环境中,必须捕获和纠正错误预测。我们提议量化 Binning,这是一个数据驱动的方法,用不确定性和估计误差界限对预测进行分类。我们的框架可以适用于任何连续的不确定性措施,可以直接识别伴有估计误差的预测的最佳子集。我们通过建立取自Qaintile Binning的两种评估指标,便利了不确定性措施之间的比较。我们比较和对比了三种缩影性不确定性措施(两个基线,以及将两者的方方面结合起来的拟议方法),这三种措施来自基于热映的里程碑性本地化模式(U-Net和补丁基)。我们展示了三大数据集的结果,包括公开提供的Cephallogy测量数据集。我们展示了如何过滤我们量化本中发现的重大错误。我们比较了从Quatbbinning Bins 大大改进了在可接受的误差门槛下预测的比例。最后,我们展示了在可接受的标度/Bincomimalal 度标准上如何有效使用数据代码,我们展示了如何在数据库中进行精确度评估。