Canine mammary carcinoma (CMC) has been used as a model to investigate the pathogenesis of human breast cancer and the same grading scheme is commonly used to assess tumor malignancy in both. One key component of this grading scheme is the density of mitotic figures (MF). Current publicly available datasets on human breast cancer only provide annotations for small subsets of whole slide images (WSIs). We present a novel dataset of 21 WSIs of CMC completely annotated for MF. For this, a pathologist screened all WSIs for potential MF and structures with a similar appearance. A second expert blindly assigned labels, and for non-matching labels, a third expert assigned the final labels. Additionally, we used machine learning to identify previously undetected MF. Finally, we performed representation learning and two-dimensional projection to further increase the consistency of the annotations. Our dataset consists of 13,907 MF and 36,379 hard negatives. We achieved a mean F1-score of 0.791 on the test set and of up to 0.696 on a human breast cancer dataset.
翻译:目前公开提供的人类乳腺癌数据集仅提供整个幻灯片图像小子集的说明。我们展示了21个CMC 西西红柿的新型数据集,为MF提供了完全说明。为此,一位病理学家对所有WSI进行潜在MF和结构的筛选,其外观相似。第二位专家盲目指定了标签和非匹配标签,第三位专家指定了最后标签。此外,我们利用机器学习来识别先前未检测的MF。最后,我们进行了演示学习和二维投影,以进一步提高说明的一致性。我们的数据集由13,907 MF和36,379个硬底片组成。我们实现了测试集上的平均F1-F-1标记为0.791,测试集上的平均F-1标记为0.691,人类乳腺癌数据集上的平均标记为0.696。