Multi-label image recognition is a challenging computer vision task of practical use. Progresses in this area, however, are often characterized by complicated methods, heavy computations, and lack of intuitive explanations. To effectively capture different spatial regions occupied by objects from different categories, we propose an embarrassingly simple module, named class-specific residual attention (CSRA). CSRA generates class-specific features for every category by proposing a simple spatial attention score, and then combines it with the class-agnostic average pooling feature. CSRA achieves state-of-the-art results on multilabel recognition, and at the same time is much simpler than them. Furthermore, with only 4 lines of code, CSRA also leads to consistent improvement across many diverse pretrained models and datasets without any extra training. CSRA is both easy to implement and light in computations, which also enjoys intuitive explanations and visualizations.
翻译:多标签图像识别是一项具有挑战性且具有实际用途的计算机愿景任务。然而,该领域的进展往往具有复杂的方法、繁重的计算和缺乏直观解释的特点。为了有效捕捉不同类别物体占据的不同空间区域,我们提议了一个令人尴尬的简单模块,命名为特定类的残余关注(CSRA ) 。 CSRA 提出一个简单的空间关注分数,然后将其与类级不可知平均集合特征结合起来,从而生成了每个类别的特定特征。 CSRA 在多标签识别上取得了最先进的结果,同时比它们简单得多。 此外,CSRA只有4行代码,还导致许多经过预先培训的模型和数据集在未经任何额外培训的情况下不断改进。 CSRA 既容易实施,也容易在计算中进行光化,这些都具有直观的解释和直观化。