Deep learning has been successfully applied to many classification problems including underwater challenges. However, a long-standing issue with deep learning is the need for large and consistently labeled datasets. Although current approaches in semi-supervised learning can decrease the required amount of annotated data by a factor of 10 or even more, this line of research still uses distinct classes. For underwater classification, and uncurated real-world datasets in general, clean class boundaries can often not be given due to a limited information content in the images and transitional stages of the depicted objects. This leads to different experts having different opinions and thus producing fuzzy labels which could also be considered ambiguous or divergent. We propose a novel framework for handling semi-supervised classifications of such fuzzy labels. It is based on the idea of overclustering to detect substructures in these fuzzy labels. We propose a novel loss to improve the overclustering capability of our framework and show the benefit of overclustering for fuzzy labels. We show that our framework is superior to previous state-of-the-art semi-supervised methods when applied to real-world plankton data with fuzzy labels. Moreover, we acquire 5 to 10\% more consistent predictions of substructures.
翻译:深层学习被成功地应用于包括水下挑战在内的许多分类问题。然而,深层学习的一个长期问题就是需要大型和一贯标签化的数据集。虽然目前半监督学习的方法可以将所需附加说明的数据数量减少10倍或10倍以上,但这一研究线仍然使用不同的类别。对于水下分类和一般的未精确真实世界数据集而言,由于图像和被描述对象过渡阶段的信息内容有限,通常不能给出清洁的等级界限。这导致不同专家持有不同的意见,从而产生模糊的标签,这些标签也可以被认为是模糊或不同的。我们提出了处理这类模糊标签的半监督分类的新框架。它基于过度分组的想法,以探测这些模糊标签中的子结构。我们提议进行新的损失,以改善我们框架的过度组合能力,并展示为模糊标签过度组合的好处。我们发现,我们的框架优于以往的状态-艺术半监督性标签,这些标签也可以被认为是模糊或不同的。我们提出了一个新的框架,用于处理这类模糊标签的半监督分类。它是基于过度的理念,以便探测这些模糊标签与实际结构相一致。