Scene classification has established itself as a challenging research problem. Compared to images of individual objects, scene images could be much more semantically complex and abstract. Their difference mainly lies in the level of granularity of recognition. Yet, image recognition serves as a key pillar for the good performance of scene recognition as the knowledge attained from object images can be used for accurate recognition of scenes. The existing scene recognition methods only take the category label of the scene into consideration. However, we find that the contextual information that contains detailed local descriptions are also beneficial in allowing the scene recognition model to be more discriminative. In this paper, we aim to improve scene recognition using attribute and category label information encoded in objects. Based on the complementarity of attribute and category labels, we propose a Multi-task Attribute-Scene Recognition (MASR) network which learns a category embedding and at the same time predicts scene attributes. Attribute acquisition and object annotation are tedious and time consuming tasks. We tackle the problem by proposing a partially supervised annotation strategy in which human intervention is significantly reduced. The strategy provides a much more cost-effective solution to real world scenarios, and requires considerably less annotation efforts. Moreover, we re-weight the attribute predictions considering the level of importance indicated by the object detected scores. Using the proposed method, we efficiently annotate attribute labels for four large-scale datasets, and systematically investigate how scene and attribute recognition benefit from each other. The experimental results demonstrate that MASR learns a more discriminative representation and achieves competitive recognition performance compared to the state-of-the-art methods
翻译:与单个对象的图像相比,现场图像可能更为精密复杂和抽象,其差别主要在于识别的颗粒度。然而,图像识别是现场识别良好表现的关键支柱,因为从对象图像中获得的知识可用于准确识别场景。现有的场景识别方法只考虑场景的分类标签。然而,我们发现,包含详细本地描述的背景资料也有助于使现场识别模型更具歧视性。在本文件中,我们的目标是利用在目标中编码的属性和类别标签信息来改进场景识别。基于属性和类别标签的互补性,我们提出图像识别是场景识别良好表现的一个关键支柱,因为从对象图像中获取的知识可以用于准确识别场景。现有的场景识别方法只考虑场景的分类标签标签类别标签。 包含详细本地描述的背景资料也有利于使场景识别模式更具歧视性。我们的目标是,利用属性和类别标签信息来改善场景识别的属性和类别信息。 战略提供了一种成本效率高得多的场景识别方法,我们从真实的和类别预测中获取了成本效率更高的方法,我们提出了一个大规模的等级,我们通过测量的等级数据,我们提出了一种更高的指标,我们通过测算出了一个更高的指标,我们如何测量了一种大的等级,我们通过测测测测测测测得的等级的等级,我们所测得的等级数据,我们如何了一种大的等级,我们用了一种更高的方法,我们所测得得得得的等级的等级的等级测得的等级的等级的等级的等级,我们用了一个更高的方法,我们用了一个更高的方法,我们用了一种更高的方法,我们所显示的等级,我们所显示的等级,我们所显示的等级学到了一种大的等级的等级的等级的等级的等级的等级学的等级学的等级,我们用到了一种测量到了一种测量到了一种测量到了一种测量的等级的等级的等级的等级的等级的等级的等级的等级的等级的等级的等级的等级的等级的等级,我们所显示的等级的等级的等级的等级的等级的等级的等级的等级的等级的等级的等级的等级的等级的等级。