In this paper, we present a novel protocol of annotation and evaluation for visual recognition. Different from traditional settings, the protocol does not require the labeler/algorithm to annotate/recognize all targets (objects, parts, etc.) at once, but instead raises a number of recognition instructions and the algorithm recognizes targets by request. This mechanism brings two beneficial properties to reduce the burden of annotation, namely, (i) variable granularity: different scenarios can have different levels of annotation, in particular, object parts can be labeled only in large and clear instances, (ii) being open-domain: new concepts can be added to the database in minimal costs. To deal with the proposed setting, we maintain a knowledge base and design a query-based visual recognition framework that constructs queries on-the-fly based on the requests. We evaluate the recognition system on two mixed-annotated datasets, CPP and ADE20K, and demonstrate its promising ability of learning from partially labeled data as well as adapting to new concepts with only text labels.
翻译:在本文中,我们提出了一个用于视觉识别的注解和评价的新协议。与传统设置不同的是,协议并不要求标签/目录一次性注解/确认所有目标(物件、部件等),而是提出一些识别指示和算法根据请求确认目标。这一机制带来两个有益的属性,以减少注解负担,即:(一) 可变颗粒度:不同情景的注解程度不同,特别是对象部件只能在大、明确的情况下贴上标签,(二)是开放式的:可以以最低的成本在数据库中添加新概念。为了处理拟议的设置,我们维持一个知识库,并设计一个基于查询的直观识别框架,根据请求在飞行上进行查询。我们评估两个混合附加说明的数据集(CPP和ADE20K)的识别系统,并表明其从部分标注的数据中学习以及适应新概念并仅使用文字标签的可行能力。