Creating a dataset for training supervised machine learning algorithms can be a demanding task. This is especially true for medical image segmentation since one or more specialists are usually required for image annotation, and creating ground truth labels for just a single image can take up to several hours. In addition, it is paramount that the annotated samples represent well the different conditions that might affect the imaged tissues as well as possible changes in the image acquisition process. This can only be achieved by considering samples that are typical in the dataset as well as atypical, or even outlier, samples. We introduce VessMAP, a heterogeneous blood vessel segmentation dataset acquired by carefully sampling relevant images from a larger non-annotated dataset. A methodology was developed to select both prototypical and atypical samples from the base dataset, thus defining an assorted set of images that can be used for measuring the performance of segmentation algorithms on samples that are highly distinct from each other. To demonstrate the potential of the new dataset, we show that the validation performance of a neural network changes significantly depending on the splits used for training the network.
翻译:暂无翻译