With the development of the medical image field, researchers seek to develop a class of datasets to block the need for medical knowledge, such as \text{MedMNIST} (v2). MedMNIST (v2) includes a large number of small-sized (28 $\times$ 28 or 28 $\times$ 28 $\times$ 28) medical samples and the corresponding expert annotations (class label). The existing baseline model (Google AutoML Vision, ResNet-50+3D) can reach an average accuracy of over 70\% on MedMNIST (v2) datasets, which is comparable to the performance of expert decision-making. Nevertheless, we note that there are two insurmountable obstacles to modeling on MedMNIST (v2): 1) the raw images are cropped to low scales may cause effective recognition information to be dropped and the classifier to have difficulty in tracing accurate decision boundaries; 2) the labelers' subjective insight may cause many uncertainties in the label space. To address these issues, we develop a Complex Mixer (C-Mixer) with a pre-training framework to alleviate the problem of insufficient information and uncertainty in the label space by introducing an incentive imaginary matrix and a self-supervised scheme with random masking. Our method (incentive learning and self-supervised learning with masking) shows surprising potential on both the standard MedMNIST (v2) dataset, the customized weakly supervised datasets, and other image enhancement tasks.
翻译:随着医学影像领域的发展,研究人员着眼于开发一类不需要医学知识的数据集,如MedMNIST(v2)。MedMNIST(v2)包括大量的小型(28×28或28×28×28)医学样本和相应的专家注释(类标签)。现有的基线模型(Google AutoML Vision,ResNet-50+3D)在MedMNIST(v2)数据集上可以达到超过70%的平均准确率,这与专家决策的性能相当。然而,我们注意到在MedMNIST(v2)上建模存在两个不可逾越的障碍:1)裁剪到低比例的原始图像可能导致有效识别信息被丢弃,分类器难以追踪准确的决策边界;2)标签者的主观洞察力可能会在标签空间中引起许多不确定性。为了解决这些问题,我们使用一个预训练框架开发了一个复杂混合器(C-Mixer),通过引入激励虚拟矩阵和带有随机屏蔽的自我监督方案,缓解了标签空间中信息不足和不确定性的问题。我们的方法(激励学习和带有屏蔽的自我监督学习)在标准的MedMNIST(v2)数据集、定制的弱监督数据集和其他图像增强任务上都显示了惊人的潜力。