Training deep learning models on medical datasets that perform well for all classes is a challenging task. It is often the case that a suboptimal performance is obtained on some classes due to the natural class imbalance issue that comes with medical data. An effective way to tackle this problem is by using targeted active learning, where we iteratively add data points to the training data that belong to the rare classes. However, existing active learning methods are ineffective in targeting rare classes in medical datasets. In this work, we propose Clinical (targeted aCtive Learning for ImbalaNced medICal imAge cLassification) a framework that uses submodular mutual information functions as acquisition functions to mine critical data points from rare classes. We apply our framework to a wide-array of medical imaging datasets on a variety of real-world class imbalance scenarios - namely, binary imbalance and long-tail imbalance. We show that Clinical outperforms the state-of-the-art active learning methods by acquiring a diverse set of data points that belong to the rare classes.
翻译:对所有班级都表现良好的医学数据集的深层次培训模式是一项艰巨的任务。由于医学数据带来的自然阶级不平衡问题,某些班级的成绩往往不尽人意。解决这一问题的一个有效办法是利用有针对性的积极学习,在属于稀有班级的培训数据中反复增加数据点。然而,现有的积极学习方法在针对医疗数据集中罕见班级方面是无效的。在这项工作中,我们提议临床(针对ImbalaNced medICal imAge cLassization)一个框架,将亚型相互信息功能用作从稀有班级获取关键数据点的获取功能。我们将我们的框架应用于一系列广泛的医学成像数据集,用于各种现实世界类不平衡情景 — 即二进制不平衡和长尾失衡。我们显示临床通过获取属于稀有班级的多种数据点,超越了当前最先进的积极学习方法。