Pretraining has sparked groundswell of interest in deep learning workflows to learn from limited data and improve generalization. While this is common for 2D image classification tasks, its application to 3D medical imaging tasks like chest CT interpretation is limited. We explore the idea of whether pretraining a model on realistic videos could improve performance rather than training the model from scratch, intended for tuberculosis type classification from chest CT scans. To incorporate both spatial and temporal features, we develop a hybrid convolutional neural network (CNN) and recurrent neural network (RNN) model, where the features are extracted from each axial slice of the CT scan by a CNN, these sequence of image features are input to a RNN for classification of the CT scan. Our model termed as ViPTT-Net, was trained on over 1300 video clips with labels of human activities, and then fine-tuned on chest CT scans with labels of tuberculosis type. We find that pretraining the model on videos lead to better representations and significantly improved model validation performance from a kappa score of 0.17 to 0.35, especially for under-represented class samples. Our best method achieved 2nd place in the ImageCLEF 2021 Tuberculosis - TBT classification task with a kappa score of 0.20 on the final test set with only image information (without using clinical meta-data). All codes and models are made available.
翻译:培训前对深层次学习工作流程产生了兴趣,以便从有限的数据中学习,并改进一般化。虽然2D图像分类任务通常如此,但对胸腔CT解释等3D医学成像任务的应用有限。我们探讨一个想法,即对现实视频模型进行预培训,是否可以提高性能,而不是从零开始培训模型,以便从头到尾从胸部CT扫描中进行结核病类分类。为了纳入空间和时间特点,我们开发了一个混合神经神经网络(CNN)和经常神经网络(RNNN)模型,其中通过CNN从CT扫描的每个轴切片中提取了这些特征,但这些图像特征的序列是用于3DX扫描分类的RNN的输入。我们称为VIPTTT-Net的模型是1300多个视频剪辑,目的是用人类活动的标签进行“从头”的分类。我们发现,对视频模型进行预先培训,可以导致更好的展示,并且大大改进模型的验证性能,从0.17至0.35分的KAP分分中提取,特别是代表不足的临床扫描。我们称为VPTLTTTTTTTTT的模型的模型,我们的最佳方法是在20-CS-CS-CS-CS-CS-CS-CS-CS-S-CRS-S-S-S-C-S-S-S-S-S-S-S-S-S-S-SlVDS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx所有最后的20S-S-Sxxxxxxx的20的升级的20的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的所有标准。