Zero-shot detection (ZSD) is a challenging task where we aim to recognize and localize objects simultaneously, even when our model has not been trained with visual samples of a few target ("unseen") classes. Recently, methods employing generative models like GANs have shown some of the best results, where unseen-class samples are generated based on their semantics by a GAN trained on seen-class data, enabling vanilla object detectors to recognize unseen objects. However, the problem of semantic confusion still remains, where the model is sometimes unable to distinguish between semantically-similar classes. In this work, we propose to train a generative model incorporating a triplet loss that acknowledges the degree of dissimilarity between classes and reflects them in the generated samples. Moreover, a cyclic-consistency loss is also enforced to ensure that generated visual samples of a class highly correspond to their own semantics. Extensive experiments on two benchmark ZSD datasets - MSCOCO and PASCAL-VOC - demonstrate significant gains over the current ZSD methods, reducing semantic confusion and improving detection for the unseen classes.
翻译:零射探测(ZSD)是一项具有挑战性的任务,我们的目标是同时识别和定位物体,即使我们的模型没有经过几个目标(“不见”)类的视觉样本培训。最近,使用基因模型的方法(如GANs)展示了一些最佳结果。最近,使用基因模型的方法(如GANs)展示了一些最佳结果,通过一个受过视觉类数据训练的GAN的语义生成了无形类样本,使香草物体探测器能够识别看不见的物体。然而,语义混乱问题仍然存在,有时无法区分语义相似的类别。在这项工作中,我们提议培训一种配有三重损失的基因模型,承认不同类别之间的差异程度,并在生成的样本中反映这些差异。此外,还实行了循环一致性损失,以确保生成的某一类的视觉样本与其自身的语义高度吻合。关于两个基准的ZSD数据集(MCCO和PASAL-VOC)的广泛实验显示,在目前的ZSD方法上取得了显著的成绩,减少了语义混淆,并改进了对隐蔽班的探测。