Zero-shot object detection aims at incorporating class semantic vectors to realize the detection of (both seen and) unseen classes given an unconstrained test image. In this study, we reveal the core challenges in this research area: how to synthesize robust region features (for unseen objects) that are as intra-class diverse and inter-class separable as the real samples, so that strong unseen object detectors can be trained upon them. To address these challenges, we build a novel zero-shot object detection framework that contains an Intra-class Semantic Diverging component and an Inter-class Structure Preserving component. The former is used to realize the one-to-more mapping to obtain diverse visual features from each class semantic vector, preventing miss-classifying the real unseen objects as image backgrounds. While the latter is used to avoid the synthesized features too scattered to mix up the inter-class and foreground-background relationship. To demonstrate the effectiveness of the proposed approach, comprehensive experiments on PASCAL VOC, COCO, and DIOR datasets are conducted. Notably, our approach achieves the new state-of-the-art performance on PASCAL VOC and COCO and it is the first study to carry out zero-shot object detection in remote sensing imagery.
翻译:零射物体探测旨在整合类级语义矢量,以在未受限制的测试图像中检测(可见的和)隐蔽的类别。在本研究中,我们揭示了这一研究领域的核心挑战:如何合成作为真实样本的类内多样化和跨级分离的稳健区域特征(对隐蔽物体而言),这些特征与真实样本是不同的,因此可以对它们进行培训。为了应对这些挑战,我们建立了一个新型零射线天体探测框架,其中包括一个在类内的语义分解组件和一个跨级结构保护组件。前者用于实现一对数的绘图,以从每个类语义矢量中获取不同的视觉特征,防止将真实的不可见天体分类为图像背景。虽然后者用来避免合成的特征过于分散,无法混合等级间和地表-地基关系。为了展示拟议方法的有效性,进行了关于PASAL VOC、COCO和DIOR数据集的全面实验。值得注意的是,我们的方法是从每个类语言矢量获得新的状态和图像观测结果,这是在图像中进行的。