Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset, but their main drawback remains the proportion of incorrect (noisy) samples retrieved. These noisy samples have been evidenced by previous works to be a mixture of in-distribution (ID) samples, assigned to the incorrect category but presenting similar visual semantics to other classes in the dataset, and out-of-distribution (OOD) images, which share no semantic correlation with any category from the dataset. The latter are, in practice, the dominant type of noisy images retrieved. To tackle this noise duality, we propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning to represent images in a feature space. We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere. We then spectrally embed the unsupervised representations using a fixed neighborhood size and apply an outlier sensitive clustering at the class level to detect the clean and OOD clusters as well as ID noisy outliers. We finally train a noise robust neural network that corrects ID noise to the correct category and utilizes OOD samples in a guided contrastive objective, clustering them to improve low-level features. Our algorithm improves the state-of-the-art results on synthetic noise image datasets as well as real-world web-crawled data. Our work is fully reproducible github.com/PaulAlbert31/SNCF.
翻译:在创建图像数据集时,使用网络图像检索搜索引擎是人工校正的一种诱人的选择,但其主要缺点仍然是不正确( noisy)样本的比例。这些噪音样本从以前的作品中得到证明,是分布(ID)样本的混合物,属于不正确的类别,但显示与数据集中其他类别相似的视觉语义学和分布(OOOD)图像,与数据集中的任何类别没有明显的语义关联。在实际中,后者是回收的噪音的主要类型。为了解决这种噪音的双重性,我们建议用一个检测步骤来开始两个阶段的算法,即我们使用非超优异对比特征学习来在功能空间中代表图像。我们发现对比学习的校正和统一原则允许将OOOD样本与单位超光谱层的ID样本进行线性分离。我们然后用固定的邻里大小将非超超超超度的表达式表达方式嵌入在类中,并在类中应用异常敏感组合来检测清洁和OOD组群群,同时进行动态外比对外部分析。我们最后用非超优的对比性对比性对比性对比性对比性对比性对比性模型来改进我们实验室的网络的图像,从而改进了我们的网络的图像,改进了我们的图像,改进了我们的图像,改进了我们的图像,改进了我们的实验室,改进了我们的网络,改进了我们的实验室,改进了我们的图像,改进了我们的图像,改进了我们的图像,改进了我们的图像。