PolypGen:用于常识评估的多中聚点探测和分离数据集 (PolypGen: A multi-center polyp detection and segmentation dataset for generalisability assessment)

Sharib Ali,Debesh Jha,Noha Ghatwary,Stefano Realdon,Renato Cannizzaro,Osama E. Salem,Dominique Lamarque,Christian Daul,Michael A. Riegler,Kim V. Anonsen,Andreas Petlund,Pål Halvorsen,Jens Rittscher,Thomas de Lange,James E. East

from arxiv, 15 pages

Polyps in the colon are widely known as cancer precursors identified by colonoscopy either related to diagnostic work-up for symptoms, colorectal cancer screening or systematic surveillance of certain diseases. Whilst most polyps are benign, the number, size and the surface structure of the polyp are tightly linked to the risk of colon cancer. There exists a high missed detection rate and incomplete removal of colon polyps due to the variable nature, difficulties to delineate the abnormality, high recurrence rates and the anatomical topography of the colon. In the past, several methods have been built to automate polyp detection and segmentation. However, the key issue of most methods is that they have not been tested rigorously on a large multi-center purpose-built dataset. Thus, these methods may not generalise to different population datasets as they overfit to a specific population and endoscopic surveillance. To this extent, we have curated a dataset from 6 different centers incorporating more than 300 patients. The dataset includes both single frame and sequence data with 3446 annotated polyp labels with precise delineation of polyp boundaries verified by six senior gastroenterologists. To our knowledge, this is the most comprehensive detection and pixel-level segmentation dataset curated by a team of computational scientists and expert gastroenterologists. This dataset has been originated as the part of the Endocv2021 challenge aimed at addressing generalisability in polyp detection and segmentation. In this paper, we provide comprehensive insight into data construction and annotation strategies, annotation quality assurance and technical validation for our extended EndoCV2021 dataset which we refer to as PolypGen.

翻译：结肠癌中的聚合物被广泛称为结肠镜检查所发现的癌症先质,这些先质要么与症状的诊断工作、直肠癌检查或某些疾病的系统监测有关,要么与某些疾病的诊断性工作有关。虽然大多数聚虫是良性的,但聚虫体的数量、大小和表面结构与结肠癌的风险密切相关。由于性质不同,难以分辨异常、复发率高和结肠结肠的解剖地形,因此结肠聚体的检测率高且去除不完全。在过去,已经建立了几种方法来进行自动化聚合物检测和分解。然而,大多数方法的关键问题是,它们还没有在大型多中心目的建立数据集的数据集中进行严格测试。因此,这些方法可能无法概括不同的人口数据集,因为它们与特定人群和内分层监测有关。我们从6个不同的中心整理了一个数据集,其中含有300多名患者。数据集包括一个单一的保证框架和序列数据,其中附有3446个附加的聚谱质聚合质检测和分解的标签,并精确地界定了聚谱质谱的分界,而多数的精确地测量为六个多中心目的目的目的的精确的诊断数据,这是由高级化学分解数据,由我们用来进行实验室分解的分解。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日