Semi-Supervised Learning (SSL) can decrease the amount of required labeled image data and thus the cost for deep learning. Most SSL methods only consider a clear distinction between classes but in many real-world datasets, this clear distinction is not given due to intra- or interobserver variability. This variability can lead to different annotations per image. Thus many images have ambiguous annotations and their label needs to be considered "fuzzy". This fuzziness of labels must be addressed as it will limit the performance of Semi-Supervised Learning (SSL) and deep learning in general. We propose Semi-Supervised Classification & Clustering (S2C2) which can extend many deep SSL algorithms. S2C2 can estimate the fuzziness of a label and applies SSL as a classification to certainly labeled data while creating distinct clusters for images with similar but fuzzy labels. We show that S2C2 results in median 7.4% better F1-score for classifications and 5.4% lower inner distance of clusters across multiple SSL algorithms and datasets while being more interpretable due to the fuzziness estimation of our method. Overall, a combination of Semi-Supervised Learning with our method S2C2 leads to better handling of the fuzziness of labels and thus real-world datasets.
翻译:半共享学习 (SSL) 可以降低所需的标签图像数据数量, 从而降低深层学习的成本 。 大多数 SSL 方法只考虑对不同类别进行明确的区分, 而在许多真实世界的数据集中, 这个清晰的区分并不是由于观察者内部或内部的变异性而给出的。 这种变异可能导致每个图像的注释不同。 因此许多图像的注释模糊, 其标签需要被视为“ 模糊 ” 。 标签的模糊性必须得到解决, 因为它将限制半共享学习(SSL) 和一般深层学习的性能 。 我们建议半共享分类和分组(S2C2), 它可以扩展许多深层的 SSL 算法。 S2C 2 可以估计标签的模糊性, 并且应用 SSLSL 作为标签数据分类的分类。 我们显示, S2 C2 的分类为7.4% 的精度, 因为它将限制半共享的 F1 和5. 4 % 。 我们的分类和数据集的中位距离将降低5.4, 将随着我们更能地处理SL2 的精确的标签方法, 。