The quality of datasets plays a crucial role in the successful training and deployment of deep learning models. Especially in the medical field, where system performance may impact the health of patients, clean datasets are a safety requirement for reliable predictions. Therefore, outlier detection is an essential process when building autonomous clinical decision systems. In this work, we assess the suitability of Self-Organizing Maps for outlier detection specifically on a medical dataset containing quantitative phase images of white blood cells. We detect and evaluate outliers based on quantization errors and distance maps. Our findings confirm the suitability of Self-Organizing Maps for unsupervised Out-Of-Distribution detection on the dataset at hand. Self-Organizing Maps perform on par with a manually specified filter based on expert domain knowledge. Additionally, they show promise as a tool in the exploration and cleaning of medical datasets. As a direction for future research, we suggest a combination of Self-Organizing Maps and feature extraction based on deep learning.
翻译:数据集的质量在成功培训和部署深层学习模型方面发挥着关键作用。特别是在医疗领域,系统性能可能影响病人的健康,清洁的数据集是可靠预测的安全要求。因此,在建立自主临床决策系统时,外部检测是一个必不可少的过程。在这项工作中,我们评估自我组织地图是否适合特别在含有白血细胞定量阶段图像的医疗数据集中进行外部检测。我们根据定量误差和远距地图检测和评估外部异常。我们的调查结果证实,在手头的数据集中,自组织地图是否适合进行不受监督的外部分布检测。自组织地图与基于专家领域知识的手动特定过滤器一起运作。此外,它们显示了作为探索和清理医疗数据集的工具的希望。作为未来研究的方向,我们建议将自组织地图和基于深层学习的特征提取结合起来。