交叉注意力变压器用于多模态无监督全身PET异常检测 (Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET Anomaly Detection)

Cancer is a highly heterogeneous condition that can occur almost anywhere in the human body. 18F-fluorodeoxyglucose is an imaging modality commonly used to detect cancer due to its high sensitivity and clear visualisation of the pattern of metabolic activity. Nonetheless, as cancer is highly heterogeneous, it is challenging to train general-purpose discriminative cancer detection models, with data availability and disease complexity often cited as a limiting factor. Unsupervised anomaly detection models have been suggested as a putative solution. These models learn a healthy representation of tissue and detect cancer by predicting deviations from the healthy norm, which requires models capable of accurately learning long-range interactions between organs and their imaging patterns with high levels of expressivity. Such characteristics are suitably satisfied by transformers, which have been shown to generate state-of-the-art results in unsupervised anomaly detection by training on normal data. This work expands upon such approaches by introducing multi-modal conditioning of the transformer via cross-attention i.e. supplying anatomical reference from paired CT. Using 294 whole-body PET/CT samples, we show that our anomaly detection method is robust and capable of achieving accurate cancer localization results even in cases where normal training data is unavailable. In addition, we show the efficacy of this approach on out-of-sample data showcasing the generalizability of this approach with limited training data. Lastly, we propose to combine model uncertainty with a new kernel density estimation approach, and show that it provides clinically and statistically significant improvements when compared to the classic residual-based anomaly maps. Overall, a superior performance is demonstrated against leading state-of-the-art alternatives, drawing attention to the potential of these approaches.

翻译：癌症是一种高度异质性的疾病，几乎可以在人体的任何部位发生。18F-脱氧葡萄糖是一种常用于检测癌症的成像模式，由于其高灵敏度和代谢活性模式的清晰可见性。然而，由于癌症高度异质性，训练通用目的的判别性癌症检测模型是具有挑战性的，数据可用性和疾病复杂性通常被视为制约因素。无监督异常检测模型已被认为是可能的解决方案。这些模型学习组织的健康表示，并通过预测与健康规范的偏差来检测癌症，这需要能够准确学习器官和其成像模式之间的远程相互作用的模型，并具有高水平的表现力。变压器适当满足这些特征，已被证明可以通过对正常数据进行训练来产生无监督异常检测方面的最新结果。本研究通过引入通过相互关注的多模态变换器进行交替处理，对这种方法进行了扩展，即提供来自匹配CT的解剖参考。使用294个全身PET / CT样本，我们展示了我们的异常检测方法是鲁棒且能够实现准确的癌症定位结果，即使在正常训练数据不可用的情况下也是如此。此外，我们展示了这种方法在样本外数据上的功效，展示了这种方法在有限的训练数据下的通用性。最后，我们建议将模型不确定性与新的核密度估计方法相结合，并展示了与经典残差基于异常地图相比提供临床上和统计显着改进的有效性。总体而言，本研究展示了超过领先的最新替代方案的出色表现，引起了人们对这些方法的关注。