The transformer is primarily used in the field of natural language processing. Recently, it has been adopted and shows promise in the computer vision (CV) field. Medical image analysis (MIA), as a critical branch of CV, also greatly benefits from this state-of-the-art technique. In this review, we first recap the core component of the transformer, the attention mechanism, and the detailed structures of the transformer. After that, we depict the recent progress of the transformer in the field of MIA. We organize the applications in a sequence of different tasks, including classification, segmentation, captioning, registration, detection, reconstruction, denoising, localization, and synthesis. The mainstream classification and segmentation tasks are further divided into eleven medical image modalities. Finally, We discuss the open challenges and future opportunities in this field. This review with the latest contents, detailed information, and task-modality organization mode may greatly benefit the broad MIA community.
翻译:Transformer主要用于自然语言处理领域,但近年来已被引入计算机视觉领域,并取得了很大进展。医学图像分析作为计算机视觉的重要分支也从这一前沿技术中受益良多。在本综述中,我们首先介绍Transformer的核心组件——注意力机制和Transformer的详细结构。之后,我们描述了Transformer在医学图像分析领域的最新进展。我们按照不同任务的顺序组织应用,包括分类、分割、标注、注册、检测、重建、去噪、定位和合成。主流的分类和分割任务进一步分为11种医学图像模态。最后,我们讨论了该领域面临的挑战和未来的机遇。这篇综述具有最新内容、详细信息和任务-模态组织模式,可以极大地惠及广大医学图像分析社区。