在大型 3D 图像中找到小物体的高效深层神经网络</s> (An efficient deep neural network to find small objects in large 3D images)

Jungkyu Park,Jakub Chłędowski,Stanisław Jastrzębski,Jan Witowski,Yanqi Xu,Linda Du,Sushma Gaddam,Eric Kim,Alana Lewin,Ujas Parikh,Anastasia Plaunova,Sardius Chen,Alexandra Millet,James Park,Kristine Pysarenko,Shalin Patel,Julia Goldberg,Melanie Wegener,Linda Moy,Laura Heacock,Beatriu Reig,Krzysztof J. Geras

3D imaging enables accurate diagnosis by providing spatial information about organ anatomy. However, using 3D images to train AI models is computationally challenging because they consist of 10x or 100x more pixels than their 2D counterparts. To be trained with high-resolution 3D images, convolutional neural networks resort to downsampling them or projecting them to 2D. We propose an effective alternative, a neural network that enables efficient classification of full-resolution 3D medical images. Compared to off-the-shelf convolutional neural networks, our network, 3D Globally-Aware Multiple Instance Classifier (3D-GMIC), uses 77.98%-90.05% less GPU memory and 91.23%-96.02% less computation. While it is trained only with image-level labels, without segmentation labels, it explains its predictions by providing pixel-level saliency maps. On a dataset collected at NYU Langone Health, including 85,526 patients with full-field 2D mammography (FFDM), synthetic 2D mammography, and 3D mammography, 3D-GMIC achieves an AUC of 0.831 (95% CI: 0.769-0.887) in classifying breasts with malignant findings using 3D mammography. This is comparable to the performance of GMIC on FFDM (0.816, 95% CI: 0.737-0.878) and synthetic 2D (0.826, 95% CI: 0.754-0.884), which demonstrates that 3D-GMIC successfully classified large 3D images despite focusing computation on a smaller percentage of its input compared to GMIC. Therefore, 3D-GMIC identifies and utilizes extremely small regions of interest from 3D images consisting of hundreds of millions of pixels, dramatically reducing associated computational challenges. 3D-GMIC generalizes well to BCS-DBT, an external dataset from Duke University Hospital, achieving an AUC of 0.848 (95% CI: 0.798-0.896).

翻译：3D 成像可以提供器官解剖的空间信息进行准确的诊断。但是, 3D 图像用于培训 AI 模型在计算上具有挑战性, 因为3D 图像由10x或100x以上像素组成。要接受高分辨率 3D 图像的培训, 革命神经网络则使用下映或投射到 2D 。我们提出一个有效的替代方案, 一个神经网络, 能够高效地分类全分辨率 3D 医疗图像。与现成的 208 神经神经网络, 我们的网络, 3D 全球软件多级分类(3D- GMIMI) 3D 相较, 使用77.98%- 90.5% 减去 GPU 记忆, 和 91.23. 00. 02 计算。虽然它只是接受图像级标签培训, 没有分解标签, 但我们提供了像素级显眼地图地图。我们提议了一个神经网络, 能够高效地对全分辨率 3D 健康进行分类, 包括85, 520D 全域2D 的病人, 合成2D 3D 3D 的合成乳房摄影, 3D 3D 3D 和3D 的合成 2D 30D 的3D 等的3D 3D 30D 30D, 的3D 数字的3D 数字的3D 数字的3D 数据直数, 直译数 3D 3D 3D 3D 3D 直译为0.0.0.8 。</s>