Modeling 3D context is essential for high-performance 3D medical image analysis. Although 2D networks benefit from large-scale 2D supervised pretraining, it is weak in capturing 3D context. 3D networks are strong in 3D context yet lack supervised pretraining. As an emerging technique, \emph{3D context fusion operator}, which enables conversion from 2D pretrained networks, leverages the advantages of both and has achieved great success. Existing 3D context fusion operators are designed to be spatially symmetric, i.e., performing identical operations on each 2D slice like convolutions. However, these operators are not truly equivariant to translation, especially when only a few 3D slices are used as inputs. In this paper, we propose a novel asymmetric 3D context fusion operator (A3D), which uses different weights to fuse 3D context from different 2D slices. Notably, A3D is NOT translation-equivariant while it significantly outperforms existing symmetric context fusion operators without introducing large computational overhead. We validate the effectiveness of the proposed method by extensive experiments on DeepLesion benchmark, a large-scale public dataset for universal lesion detection from computed tomography (CT). The proposed A3D consistently outperforms symmetric context fusion operators by considerable margins, and establishes a new \emph{state of the art} on DeepLesion. To facilitate open research, our code and model in PyTorch are available at https://github.com/M3DV/AlignShift.
翻译:模拟 3D 环境是高性能 3D 医学图像分析的基本条件。 虽然 2D 网络受益于大型 2D 监督的医学图像分析, 但捕捉 3D 环境的功能较弱。 3D 网络在 3D 背景下很强大, 但却缺乏受监督的预培训。 作为新兴技术, \emph{ 3D 环境聚合操作员 能够从 2D 预先培训的网络转换, 利用这两种网络的优势并取得了巨大成功。 现有的 3D 环境融合操作员的设计是空间对称的, 也就是说, 在2D 切片上执行相同的操作。 然而, 这些操作员不是真正易变的。 然而, 这些操作员不是真正易变的, 特别是当仅使用少量的 3D 切片作为投入时。 在本文中, 我们提出一个新的不对称的 3D 3 3 环境组合操作员使用不同重量的连接 3D 。 值得注意的是, A3D 不是翻译的模型, 而它大大超越了现有的正调环境融合操作员 。 我们通过大规模的深度的 3 进行 深度的深度的测试 。