Due to the high inter-class similarity caused by the complex composition within scenes and the co-existing objects across scenes, various studies have explored object semantic knowledge within scenes to improve scene recognition. However, a resulting issue arises as semantic segmentation or object detection techniques demand heavy computational power, thereby burdening the network considerably. This limitation often renders object-assisted approaches incompatible with edge devices. In contrast, this paper proposes a semantic-based similarity prototype that assists the scene recognition network to achieve higher accuracy without increasing network parameters. It is simple and can be plug-and-played into existing pipelines. More specifically, a statistical strategy is introduced to depict semantic knowledge in scenes as class-level semantic representations. These representations are utilized to explore inter-class correlations, ultimately constructing a similarity prototype. Furthermore, we propose two ways to use the similarity prototype to support network training from the perspective of gradient label softening and batch-level contrastive loss, respectively. Comprehensive evaluations on multiple benchmarks show that our similarity prototype enhances the performance of existing networks without adding any computational burden. Code and the statistical similarity prototype will be available soon.
翻译:暂无翻译