完全附加说明的全幻灯片图像数据集,用于协助人类乳腺癌研究。 (A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research)

Canine mammary carcinoma (CMC) has been used as a model to investigate the pathogenesis of human breast cancer and the same grading scheme is commonly used to assess tumor malignancy in both. One key component of this grading scheme is the density of mitotic figures (MF). Current publicly available datasets on human breast cancer only provide annotations for small subsets of whole slide images (WSIs). We present a novel dataset of 21 WSIs of CMC completely annotated for MF. For this, a pathologist screened all WSIs for potential MF and structures with a similar appearance. A second expert blindly assigned labels, and for non-matching labels, a third expert assigned the final labels. Additionally, we used machine learning to identify previously undetected MF. Finally, we performed representation learning and two-dimensional projection to further increase the consistency of the annotations. Our dataset consists of 13,907 MF and 36,379 hard negatives. We achieved a mean F1-score of 0.791 on the test set and of up to 0.696 on a human breast cancer dataset.

翻译：目前公开提供的人类乳腺癌数据集仅提供整个幻灯片图像小子集的说明。我们展示了21个CMC 西西红柿的新型数据集,为MF提供了完全说明。为此,一位病理学家对所有WSI进行潜在MF和结构的筛选,其外观相似。第二位专家盲目指定了标签和非匹配标签,第三位专家指定了最后标签。此外,我们利用机器学习来识别先前未检测的MF。最后,我们进行了演示学习和二维投影,以进一步提高说明的一致性。我们的数据集由13,907 MF和36,379个硬底片组成。我们实现了测试集上的平均F1-F-1标记为0.791,测试集上的平均F-1标记为0.691,人类乳腺癌数据集上的平均标记为0.696。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

机器学习速查手册，135页pdf

专知会员服务

344+阅读 · 2020年3月15日