在GI Endoscocpic 图像中采用不平衡数据集最佳深度单级分类法进行多类异常探测 (Multiclass Anomaly Detection in GI Endoscopic Images using Optimized Deep One-class Classification in an Imbalanced Dataset)

Wireless Capsule Endoscopy helps physicians examine the gastrointestinal (GI) tract noninvasively, with the cost of generating many images. Many available datasets, such as KID2 and Kvasir, suffer from imbalance issue which make it difficult to train an effective artificial intelligence (AI) system. Moreover, increasing number of classes makes the problem worse. In this study, an ensemble of one-class classifiers is used for detecting anomaly. This method focuses on learning single models using samples from only one class, and ensemble all models for multiclass classification. A total of 1,778 normal, 227 inflammation, 303 vascular diseases, and 44 polyp images have been used from the KID2 dataset. In the first step, deep features are extracted based on an autoencoder architecture from the preprocessed images. Then, these features are oversampled using Synthetic Minority Over-sampling Technique and clustered using Ordering Points to Identify the Clustering Structure. To create one-class classification model, the Support Vector Data Descriptions are trained on each cluster with the help of Ant Colony Optimization, which is also used for tuning clustering parameters for improving F1-score. This process is applied on each classes and ensemble of final models used for multiclass classification. The entire algorithm ran 5 times and obtained F1-score 96.3 +- 0.2% and macro-average F1-score 85.0 +- 0.4%, for anomaly detection and multiclass classification, respectively. The results are compared with GoogleNet, AlexNet, Resnet50, VGG16 and other published algorithms, and demonstrate that the proposed method is a competitive choice for multiclass class anomaly detection in GI images.

翻译：无线卡普勒 Endoscop 帮助医生检查胃肠道(GI), 其成本为生成许多图像。许多可用的数据集, 如 KID2 和 Kvasir, 都存在不平衡问题, 这使得难以培训有效的人工智能系统(AI) 。此外, 类数的增加使得问题更加严重。在此研究中, 使用单级分类器的集合来检测异常点。这个方法侧重于学习单类模型, 仅使用一个类样本, 并混合所有模型, 用于多级分类。从 KID2 数据集中共使用了1 778正常、 227 炎、 303 血管疾病和44 多元图象。在第一步, 根据预处理的图像的自动编码结构, 深度特性使得问题更加严重。然后, 这些特性被过度标注, 使用超标集的技术和集集点来识别组合结构。要创建一等分类模型, 使用支持性矢量数据说明, 将每个分类的支持性数据解数据解解解解数级数 3, 用于每类的高级分类的用于使用高级数据的高级和高级的数据的。用于的的的的的的高级和的的的的高级的的高级的的的的的高级高级的的的的和的的的的的的的的的的的的的的和的的的的的的的的的的的的的的的的和的的的的的的的和的的的的的的的的的的的的的的的的的的的的的的和的的的的的的的和的的的的的的的的的的的的的的的的的的

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日