物种间细胞检测:关于雌性、人类和胎儿样本中肺部血友病的数据集 (Inter-Species Cell Detection: Datasets on pulmonary hemosiderophages in equine, human and feline specimens)

Christian Marzahl,Jenny Hill,Jason Stayt,Dorothee Bienzle,Lutz Welker,Frauke Wilm,Jörn Voigt,Marc Aubreville,Andreas Maier,Robert Klopfleisch,Katharina Breininger,Christof A. Bertram

from arxiv, Submitted to SCIENTIFIC DATA

Pulmonary hemorrhage (P-Hem) occurs among multiple species and can have various causes. Cytology of bronchoalveolarlavage fluid (BALF) using a 5-tier scoring system of alveolar macrophages based on their hemosiderin content is considered the most sensitive diagnostic method. We introduce a novel, fully annotated multi-species P-Hem dataset which consists of 74 cytology whole slide images (WSIs) with equine, feline and human samples. To create this high-quality and high-quantity dataset, we developed an annotation pipeline combining human expertise with deep learning and data visualisation techniques. We applied a deep learning-based object detection approach trained on 17 expertly annotated equine WSIs, to the remaining 39 equine, 12 human and 7 feline WSIs. The resulting annotations were semi-automatically screened for errors on multiple types of specialised annotation maps and finally reviewed by a trained pathologists. Our dataset contains a total of 297,383 hemosiderophages classified into five grades. It is one of the largest publicly availableWSIs datasets with respect to the number of annotations, the scanned area and the number of species covered.

翻译：肺部出血(P-Hem)在多种物种中发生,可有多种原因。使用五级高超单肺大发体评分系统,使用五级高单肺大发体(BALLF),使用五级高超体积分数系统,根据雌激素含量,被认为是最敏感的诊断方法。我们引入了一个新颖的、充分附加说明的多种P-Hem数据集,由74个细胞整体幻灯片图像组成,配有雌性、雌性以及人类样本。为创建这一高质量和高数量数据集,我们开发了一个批注管道,将人类专门知识与深层次学习和数据可视化技术相结合。我们采用了一种深层次的基于学习的物体探测方法,对17个配有专家注解的Equine WSI进行了培训,对其余的39个电子、12个人类和7个纤维WSI进行了研究。由此产生的说明是半自动筛选,以发现多种类型的专门注解图和经过培训的病理学家最后审查的图谱。我们的数据集中共有297、383个肝镜图解的总数,对可公开分类为5个等级的ISIS最高解号。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。