Cancer is one of the leading causes of death worldwide, and head and neck (H&N) cancer is amongst the most prevalent types. Positron emission tomography and computed tomography are used to detect, segment and quantify the tumor region. Clinically, tumor segmentation is extensively time-consuming and prone to error. Machine learning, and deep learning in particular, can assist to automate this process, yielding results as accurate as the results of a clinician. In this paper, we investigate a vision transformer-based method to automatically delineate H&N tumor, and compare its results to leading convolutional neural network (CNN)-based models. We use multi-modal data from CT and PET scans to perform the segmentation task. We show that a solution with a transformer-based model has the potential to achieve comparable results to CNN-based ones. With cross validation, the model achieves a mean dice similarity coefficient (DSC) of 0.736, mean precision of 0.766 and mean recall of 0.766. This is only 0.021 less than the 2020 competition winning model (cross validated in-house) in terms of the DSC score. On the testing set, the model performs similarly, with DSC of 0.736, precision of 0.773, and recall of 0.760, which is only 0.023 lower in DSC than the 2020 competition winning model. This work shows that cancer segmentation via transformer-based models is a promising research area to further explore.
翻译:癌症是全世界死亡的主要原因之一,头部和颈部(H&N)癌症是造成死亡的主要原因之一,头部和颈部(H&N)癌症是最常见的类型之一。使用光子排放断断层和计算断层法来检测、分解和量化肿瘤区域。在临床中,肿瘤分解广泛耗费时间,容易出错。机器学习,特别是深层学习,可以帮助使这一进程自动化,产生与临床医生结果一样准确的结果。在本文中,我们调查一种基于视觉变压器的法基变压器法,自动界定H&N肿瘤,并将其结果与主要神经网络(CNN)模型进行比较。我们使用CT和PET扫描的多模式数据进行分解和量化。我们用变压器模型的解决方案有可能取得与CNNC的模型的类似结果。在交叉验证中,模型得出了0.736(DC) 0.766和0.766(O) 的平均值比2020年竞争取标模型少0.2021(跨部变压变压器)的模型(在本文中验证)。