We consider low-shot counting of arbitrary semantic categories in the image using only few annotated exemplars (few-shot) or no exemplars (no-shot). The standard few-shot pipeline follows extraction of appearance queries from exemplars and matching them with image features to infer the object counts. Existing methods extract queries by feature pooling, but neglect the shape information (e.g., size and aspect), which leads to a reduced object localization accuracy and count estimates. We propose a Low-shot Object Counting network with iterative prototype Adaptation (LOCA). Our main contribution is the new object prototype extraction module, which iteratively fuses the exemplar shape and appearance queries with image features. The module is easily adapted to zero-shot scenario, enabling LOCA to cover the entire spectrum of low-shot counting problems. LOCA outperforms all recent state-of-the-art methods on FSC147 benchmark by 20-30% in RMSE on one-shot and few-shot and achieves state-of-the-art on zero-shot scenarios, while demonstrating better generalization capabilities.
翻译:我们仅使用几个附加说明的示意图(few-shot)或没有示例(no-shot)来考虑图像中任意语义分类的低镜头计数。标准的微镜头管道是在从示象器中提取外观查询和将其与图像特征相匹配以推算天体计数之后进行的。现有方法通过特性集合来提取查询,但忽略形状信息(例如大小和方面),从而导致物体本地化精确度和计数估计下降。我们建议建立一个低镜头物体计数网络,并配有迭接原型适应(LAOCA) 。我们的主要贡献是新的物体原型提取模块,该模块将外观形状和外观查询与图像特征相互连接。该模块很容易适应零镜头,使LOCA能够覆盖所有低镜头计数问题的范围。 LOCA在一发和几发图像假设情景上比FSC147基准的最新最新方法高出20-30%,同时显示更普遍的通用能力。