骨骨折分类的愿景变形器 (Vision Transformers for femur fracture classification)

Objectives: In recent years, the scientific community has focused on the development of Computer-Aided Diagnosis (CAD) tools that could improve bone fractures' classification. However, the results of the classification of fractures in subtypes with the proposed datasets were far from optimal. This paper proposes a very recent and outperforming deep learning technique, the Vision Transformer (ViT), in order to improve the fracture classification, by exploiting its self-attention mechanism. Methods: 4207 manually annotated images were used and distributed, by following the AO/OTA classification, in different fracture types, the largest labeled dataset of proximal femur fractures used in literature. The ViT architecture was used and compared with a classic Convolutional Neural Network (CNN) and a multistage architecture composed by successive CNNs in cascade. To demonstrate the reliability of this approach, 1) the attention maps were used to visualize the most relevant areas of the images, 2) the performance of a generic CNN and ViT was also compared through unsupervised learning techniques, and 3) 11 specialists were asked to evaluate and classify 150 proximal femur fractures' images with and without the help of the ViT. Results: The ViT was able to correctly predict 83% of the test images. Precision, recall and F1-score were 0.77 (CI 0.64-0.90), 0.76 (CI 0.62-0.91) and 0.77 (CI 0.64-0.89), respectively. The average specialists' diagnostic improvement was 29%. Conclusions: This paper showed the potential of Transformers in bone fracture classification. For the first time, good results were obtained in sub-fractures with the largest and richest dataset ever.

翻译：近些年来,科学界侧重于开发计算机辅助诊断工具(CAD),可以改善骨骨折的分类。然而,对子型骨折进行分类的结果远非最佳。本文件建议采用最新和出色的深层学习技术,即愿景变换器(ViT),以利用其自我注意机制改进骨折分类。方法:采用AO/OTA分类,在不同骨折类型中,使用和分发了4207个手动附加说明的图像,这可以改善骨折的分类。但是,在文献中使用了最有标签的类骨折类骨折分类,对子型骨折进行了分类。 ViT结构被使用,并与经典的Cultural Neural网络(CNN)和由连续的CNCNN组成的多阶段结构进行了比较。为了表明这一方法的可靠性,1 使用关注图用于对图像中最相关的领域进行视觉化,2 通用的CNN和ViT的性能表现也通过良好的学习技术进行了比较,3名专家被要求评估并分类150个直流骨质骨质变的骨折结果。3,在Vial- Prodealimalxxleval1 和Fimlevlevlevlevleval 10 中,这一平均结果显示的平均值显示为最大。