Aerial scenes are more complicated in terms of object distribution and spatial arrangement than natural scenes due to the bird view, and thus remain challenging to learn discriminative scene representation. Recent solutions design \textit{local semantic descriptors} so that region of interests (RoIs) can be properly highlighted. However, each local descriptor has limited description capability and the overall scene representation remains to be refined. In this paper, we solve this problem by designing a novel representation set named \textit{instance representation bank} (IRB), which unifies multiple local descriptors under the multiple instance learning (MIL) formulation. This unified framework is not trivial as all the local semantic descriptors can be aligned to the same scene scheme, enhancing the scene representation capability. Specifically, our IRB learning framework consists of a backbone, an instance representation bank, a semantic fusion module and a scene scheme alignment loss function. All the components are organized in an end-to-end manner. Extensive experiments on three aerial scene benchmarks demonstrate that our proposed method outperforms the state-of-the-art approaches by a large margin.
翻译:与自然场景相比,天体场景在物体分布和空间安排方面比自然场景更为复杂,由于鸟的视角,因此在学习歧视场景代表方面仍然具有挑战性。最近的解决办法设计 \ textit{ local 语义描述符}, 以便适当突出利益区域( RoIs ) 。 然而, 每个本地描述器的描述能力有限, 总体场景代表仍然有待改进 。 在本文件中, 我们通过设计一个名为\ textit{ instance respresenting bank} (IRB) 的新代表器来解决这个问题, 该代表器在多重实例学习( MIL) 的公式下将多个本地描述器统一起来。 这个统一的框架并非微不足道, 因为所有本地语义描述符都能够与同一场景方案保持一致, 增强场景代表能力。 具体而言, 我们的IMB 学习框架由骨架、 实例代表库、 语义组合模块和场景计划匹配损失功能组成。 所有组成部分都是以端到端的方式组织。 在三个空景场景基准上进行广泛的实验, 显示我们提出的方法都大大超越了状态方法。