Multi-label image classification allows predicting a set of labels from a given image. Unlike multiclass classification, where only one label per image is assigned, such setup is applicable for a broader range of applications. In this work we revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches. Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy graph-based methods can demonstrate just a small accuracy drop, while spending less computational resources on inference. In our training strategy, instead of Asymmetric Loss (ASL), which is the de-facto standard for multilabel classification, we introduce its modification acting in the angle space. It implicitly learns a proxy feature vector on the unit hypersphere for each class, providing a better discrimination ability, than binary cross entropy loss does on unnormalized features. With the proposed loss and training strategy, we obtain SOTA results among single modality methods on widespread multilabel classification benchmarks such as MS-COCO, PASCAL-VOC, NUS-Wide and Visual Genome 500. Source code of our method is available as a part of the OpenVINO Training Extensions https://github.com/openvinotoolkit/deep-object-reid/tree/multilabel
翻译:多标签图像分类允许从给定图像中预测一组标签。 与多级分类不同, 多图像只指定一个标签, 这种设置适用于更广泛的应用范围。 在这项工作中, 我们重新审视了两种通用的多标签分类方法: 以变压器为基础的头和标签关系信息图表处理分支。 虽然以变压器为基础的头被认为比以图形为基础的分支取得更好的结果, 但是我们认为, 以图表为基础的适当培训战略方法可以显示一个小的精确度下降, 而将较少的计算资源用于推断。 在我们的培训战略中, 而不是作为多标签分类标准的Asymical Loss(ASL- VOC)、 NUS- Wide(ASS- WIDE), 我们引入了它在角度空间上的修改。 它暗含着在每类单位超视镜中学习一个代理特性矢量矢量, 提供了更好的歧视能力, 而不是在不规范化的特性上, 我们提出的损失和培训战略, 我们获得SOITA的结果, 以及多种标签分类基准的单一方式方法, 如 MS-CO, PASAL- VOC- VOC, NUS-Wide- Widedeidedeidemobreal develop as a preal part of sop op compalprepalviolvelations problegolvealpalpalps.