Recently proposed few-shot image classification methods have generally focused on use cases where the objects to be classified are the central subject of images. Despite success on benchmark vision datasets aligned with this use case, these methods typically fail on use cases involving densely-annotated, busy images: images common in the wild where objects of relevance are not the central subject, instead appearing potentially occluded, small, or among other incidental objects belonging to other classes of potential interest. To localize relevant objects, we employ a prototype-based few-shot segmentation model which compares the encoded features of unlabeled query images with support class centroids to produce region proposals indicating the presence and location of support set classes in a query image. These region proposals are then used as additional conditioning input to few-shot image classifiers. We develop a framework to unify the two stages (segmentation and classification) into an end-to-end classification model -- PRoPnet -- and empirically demonstrate that our methods improve accuracy on image datasets with natural scenes containing multiple object classes.
翻译:最近提出的微小图像分类方法一般侧重于要分类的物体是图像中心主题的用户案例。尽管基准视觉数据集与该使用案例一致,这些方法在基准视觉数据集中取得了成功,但通常在涉及密集附加说明、繁忙图像的用户案例中失败:在野外常见的图像,相关对象不是核心对象,而是潜在隐蔽的、小的,或其他属于其他潜在对象类别的附带物体。为了将相关对象本地化,我们采用了一个基于原型的微小截断模型,将未贴标签查询图像的编码特征与支持类分类的机器人进行比较,以产生区域建议,表明查询图像中支持组类别的存在和位置。然后,这些区域建议被用作少数图像分类者的额外调节投入。我们开发一个框架,将两个阶段(分类和分类)统一成终端到终端分类模型 -- PROPnet -- 并用经验证明我们的方法提高了包含多个对象类别的自然场景图像数据集的准确性。