Methods for object detection and segmentation often require abundant instance-level annotations for training, which are time-consuming and expensive to collect. To address this, the task of zero-shot object detection (or segmentation) aims at learning effective methods for identifying and localizing object instances for the categories that have no supervision available. Constructing architectures for these tasks requires choosing from a myriad of design options, ranging from the form of the class encoding used to transfer information from seen to unseen categories, to the nature of the function being optimized for learning. In this work, we extensively study these design choices, and carefully construct a simple yet extremely effective zero-shot recognition method. Through extensive experiments on the MSCOCO dataset on object detection and segmentation, we highlight that our proposed method outperforms existing, considerably more complex, architectures. Our findings and method, which we propose as a competitive future baseline, point towards the need to revisit some of the recent design trends in zero-shot detection / segmentation.
翻译:为了解决这一问题,零射物体探测(或分解)任务旨在学习有效方法,查明没有监督的类别中的物体实例并将其本地化。为这些任务构建结构,需要从多种多样的设计选项中作出选择,从用于将信息从可见的类别转移至不可见类别,到用于学习的功能的优化性质,从分类编码形式到分类,从分类编码形式到分类,到分类和分解往往都需要大量实例说明。在这项工作中,我们广泛研究这些设计选项,并仔细构建一个简单而极为有效的零射识别方法。通过对物体探测和分解的MSCO数据集进行广泛的实验,我们强调,我们拟议的方法优于现有的、更为复杂的结构。我们建议作为竞争未来基准的我们的调查结果和方法表明,需要重新审视零光检测/分解中最近的一些设计趋势。