将计量学习和关注负责人相结合,以准确和高效多标签图像分类 (Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification)

Multi-label image classification allows predicting a set of labels from a given image. Unlike multiclass classification, where only one label per image is assigned, such a setup is applicable for a broader range of applications. In this work we revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches. Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy, graph-based methods can demonstrate just a small accuracy drop, while spending less computational resources on inference. In our training strategy, instead of Asymmetric Loss (ASL), which is the de-facto standard for multilabel classification, we introduce its metric learning modification. In each binary classification sub-problem it operates with $L_2$ normalized feature vectors coming from a backbone and enforces angles between the normalized representations of positive and negative samples to be as large as possible. This results in providing a better discrimination ability, than binary cross entropy loss does on unnormalized features. With the proposed loss and training strategy, we obtain SOTA results among single modality methods on widespread multilabel classification benchmarks such as MS-COCO, PASCAL-VOC, NUS-Wide and Visual Genome 500. Source code of our method is available as a part of the OpenVINO Training Extensions https://github.com/openvinotoolkit/deep-object-reid/tree/multilabel

翻译：多标签图像分类允许从给定图像中预测一组标签。与多级分类不同, 多级分类只给每个图像指定一个标签, 这种设置适用于更广泛的应用程序。在此工作中, 我们重新审视了两种通用的多标签分类方法: 以变压器为基础的头和标签关系信息图表处理分支。虽然以变压器为基础的头被认为比以图形为基础的分支取得更好的结果, 但我们认为, 通过适当的培训战略, 以图为基础的方法可以显示一个小的精确度下降, 而用较少的计算资源来推断。在我们的培训战略中, 而不是以Asymological Lost(ASL) 标准(ASSL), 即多标签分类的脱法标准, 我们引入了它的标准学习修改。在每一个二进制分类子模块分类中, 以$L_ 2$ 的常规特性矢量为操作, 并尽可能在正态和负式样本的正常表达式表达式之间划出更大的角度。这导致比二进式交叉丢失的计算能力得到更好的歧视能力, 在非常规化的特性上, 我们提出的损失和培训战略中, 我们获得了SOVTA- AL- AS- IPIL 格式格式的标准化格式格式格式格式的版本的标准化的标准化的标准化的版本,, 我们的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本, 格式的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本, 的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的