We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes which are not observed during training. We work with a challenging set of object classes, not restricting ourselves to similar and/or fine-grained categories cf. prior works on zero-shot classification. We follow a principled approach by first adapting visual-semantic embeddings for ZSD. We then discuss the problems associated with selecting a background class and motivate two background-aware approaches for learning robust detectors. One of these models uses a fixed background class and the other is based on iterative latent assignments. We also outline the challenge associated with using a limited number of training classes and propose a solution based on dense sampling of the semantic label space using auxiliary data with a large number of categories. We propose novel splits of two standard detection datasets - MSCOCO and VisualGenome and discuss extensive empirical results to highlight the benefits of the proposed methods. We provide useful insights into the algorithm and conclude by posing some open questions to encourage further research.
翻译:我们引入并解决零射物体探测(ZSD)问题,目的是检测培训期间未观察到的物体类别;我们与一组具有挑战性的物体类别合作,不局限于类似和(或)细微的类别,参照以前零射分分类的著作;我们首先对ZSD的视觉和成文嵌入进行修改,采取原则性做法;然后我们讨论与选择一个背景类有关的问题,并激励两种有背景意识的学习强力探测器的方法;其中一种模型使用固定背景类,另一种基于迭代潜任务;我们还概述了与使用数量有限的训练类有关的挑战,并提出了一个基于使用大量类别辅助数据对语义标签空间进行密集取样的解决办法;我们建议对两个标准探测数据集(MCCO和VisionGenome)进行新的分割,并讨论广泛的经验性结果,以突出拟议方法的益处;我们提供了对算法的有用见解,并通过提出一些公开问题来鼓励进一步研究。