Screening Papanicolaou test samples effectively reduces cervical cancer-related mortality, but the lack of trained cytopathologists prevents its widespread adoption in low-resource settings. Developing AI algorithms, e.g., deep learning to analyze the digitized cytology images suited to resource-constrained countries is appealing. Albeit successful, it comes at the price of collecting large annotated training datasets, which is both costly and time-consuming. Our study shows that the large number of unlabeled images that can be sampled from digitized cytology slides make for a ripe ground where self-supervised learning methods can thrive and even outperform off-the-shelf deep learning models on various downstream tasks. Along the same line, we report improved performance and data efficiency using modern augmentation strategies.
翻译:检查Papanicolaou检测样本有效降低了宫颈癌相关死亡率,但由于缺乏训练有素的细胞病理学家,因此无法在低资源环境中广泛采用。 开发人工智能算法(例如深入分析适合资源紧缺国家的数字化细胞图象)很有吸引力。 尽管取得了成功,但它是以收集大量附加注释的培训数据集为代价的,该数据集既昂贵又费时。 我们的研究显示,从数字化细胞学幻灯片中抽取的大量未贴标签的图像,可以使一个成熟的土壤成为自我监督的学习方法能够蓬勃发展甚至超越各种下游任务的现成深层学习模型的成熟土壤。 在同一条线上,我们报告使用现代增强战略提高了绩效和数据效率。