Object counting is a seemingly simple task with diverse real-world applications. Most counting methods focus on counting instances of specific, known classes. While there are class-agnostic counting methods that can generalise to unseen classes, these methods require reference images to define the type of object to be counted, as well as instance annotations during training. We identify that counting is, at its core, a repetition-recognition task and show that a general feature space, with global context, is sufficient to enumerate instances in an image without a prior on the object type present. Specifically, we demonstrate that self-supervised vision transformer features combined with a lightweight count regression head achieve competitive results when compared to other class-agnostic counting tasks without the need for point-level supervision or reference images. Our method thus facilitates counting on a constantly changing set composition. To the best of our knowledge, we are both the first reference-less class-agnostic counting method as well as the first weakly-supervised class-agnostic counting method.
翻译:对象计数是一个看似简单的任务, 具有不同的真实世界应用程序。 大多数计数方法都侧重于计数特定已知类的事例。 虽然有类级不可知的计数方法可以概括到不可见类中, 但是这些方法需要参考图像来定义要计算的对象类型, 以及训练过程中的实例说明 。 我们确认计数是其核心的重复识别任务, 并显示一个具有全球背景的一般特征空间足以在图像中列举实例, 而无需事先列出现有对象类型 。 具体地说, 我们证明自我监督的视像变异器特性加上轻量计数回归头, 与其他类类不可知计数任务相比, 在不需要点级监督或参考图像的情况下, 取得了竞争性的结果 。 因此, 我们的方法有利于计算一个不断改变的设定构成 。 就我们所知, 我们既是第一种无参考的类不可知的类计数方法, 也是第一种微弱的类不可知的计算方法 。