We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28x28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including open-source or commercial AutoML tools. The datasets, evaluation code and baseline methods for MedMNIST are publicly available at https://medmnist.github.io/.
翻译:MedMNIST是10个预先处理的医学开放数据集的集成。MedMNIST正在标准化,对轻型的28x28图像进行分类,不需要背景知识。涵盖医疗图像分析的主要数据模式,在数据规模(100至100 000)和任务(二等/多等、正反转和多标签)方面各不相同。MedMNIST可用于教育目的、快速原型、多式机器学习或医学图像分析中的自动ML。此外,MedMNIST分类Decathlon设计了对所有10个数据集的自动ML算法的基准;我们比较了包括开放源或商业自动ML工具在内的若干基线方法。MedMINST的数据集、评价代码和基线方法可在https://medmnist.github.io/上公开查阅。