对深心神经网络进行临界评估,以发现脊椎断裂 (Critical Evaluation of Deep Neural Networks for Wrist Fracture Detection)

Abu Mohammed Raisuddin,Elias Vaattovaara,Mika Nevalainen,Marko Nikki,Elina Järvenpää,Kaisa Makkonen,Pekka Pinola,Tuula Palsio,Arttu Niemensivu,Osmo Tervonen,Aleksei Tiulpin

Wrist Fracture is the most common type of fracture with a high incidence rate. Conventional radiography (i.e. X-ray imaging) is used for wrist fracture detection routinely, but occasionally fracture delineation poses issues and an additional confirmation by computed tomography (CT) is needed for diagnosis. Recent advances in the field of Deep Learning (DL), a subfield of Artificial Intelligence (AI), have shown that wrist fracture detection can be automated using Convolutional Neural Networks. However, previous studies did not pay close attention to the difficult cases which can only be confirmed via CT imaging. In this study, we have developed and analyzed a state-of-the-art DL-based pipeline for wrist (distal radius) fracture detection -- DeepWrist, and evaluated it against one general population test set, and one challenging test set comprising only cases requiring confirmation by CT. Our results reveal that a typical state-of-the-art approach, such as DeepWrist, while having a near-perfect performance on the general independent test set, has a substantially lower performance on the challenging test set -- average precision of 0.99 (0.99-0.99) vs 0.64 (0.46-0.83), respectively. Similarly, the area under the ROC curve was of 0.99 (0.98-0.99) vs 0.84 (0.72-0.93), respectively. Our findings highlight the importance of a meticulous analysis of DL-based models before clinical use, and unearth the need for more challenging settings for testing medical AI systems.

翻译：骨折是最常见的骨折类型,发病率很高。常规放射学(即X射线成像)通常用于手腕骨折的检测,但偶尔断裂的划分会产生问题,需要通过计算断层仪(CT)来进行诊断。深智(DL)领域最近的进展表明,手腕骨折的检测可以使用革命神经网络自动进行。然而,以前的研究并没有密切关注只有CT成像才能证实的困难案例。在本研究中,我们开发并分析了一个基于手腕(Distal半径)的基于DL的状态管道 -- -- DeepWrist,并对照一个普通人口测试集进行了评估,还有一套挑战性的测试集,仅包括需要CT确认的案例。我们的结果显示,一种典型的状态方法,如DeepWrist, 而在通用独立直径直线测试集中,其性能大大低于具有挑战性的测试集 -- -- 手腕(Distal半径)检测结果的平均精确度为0.99(0.99)和直径直径(0.80)区域(0.98)的直径(0.98)和0.98)的直径直径(0.98),在0.80-直径(0.39)的直径(0.48)的直方向(0.48)分析中,在0.48)区域(0.0.48)的0.48)的诊断(0.48-直根根根根根根根)的诊断(0.48-直)的根根根根根根)的临床(0.80-直线(0.98(0.9)的诊断)的诊断)的诊断值(0.48)的根)的诊断值(0.48)的诊断值(0.39),在0.48)的根根根根根)的根)的根根根根(0.60-底(0.39)的根根)的根根根根根根根根根根)的根根根根)的根的根的根的根基的根的根(0.48-底(0.48)的根基的根基的根基(0.39)测试值(0.)和直径(0.39)值(0.48),在0.39)值(0.39)值(0.39)的根)的根根根根根根根根)的根)的根)的根基的根)的根