The convention standard for object detection uses a bounding box to represent each individual object instance. However, it is not practical in the industry-relevant applications in the context of warehouses due to severe occlusions among groups of instances of the same categories. In this paper, we propose a new task, ie, simultaneously object localization and counting, abbreviated as Locount, which requires algorithms to localize groups of objects of interest with the number of instances. However, there does not exist a dataset or benchmark designed for such a task. To this end, we collect a large-scale object localization and counting dataset with rich annotations in retail stores, which consists of 50,394 images with more than 1.9 million object instances in 140 categories. Together with this dataset, we provide a new evaluation protocol and divide the training and testing subsets to fairly evaluate the performance of algorithms for Locount, developing a new benchmark for the Locount task. Moreover, we present a cascaded localization and counting network as a strong baseline, which gradually classifies and regresses the bounding boxes of objects with the predicted numbers of instances enclosed in the bounding boxes, trained in an end-to-end manner. Extensive experiments are conducted on the proposed dataset to demonstrate its significance and the analysis discussions on failure cases are provided to indicate future directions. Dataset is available at https://isrc.iscas.ac.cn/gitlab/research/locount-dataset.
翻译:目标探测的公约标准使用一个约束框来代表每个对象实例。 但是,在仓库中,由于同一类别中各组情况严重隔离,因此在与行业有关的应用中,这个标准并不切实际。 在本文件中,我们提议一项新的任务,即目标定位和计算同时进行,缩写为Locount,要求用算法将感兴趣对象群体与实例数量相匹配;然而,没有为这一任务设计的数据集或基准。为此,我们收集一个大型目标本地化和计算数据集,其中含有零售商店中丰富的说明,由50 394个图像组成,其中140个类别中190万个目标实例。我们提供新的评估协议,将培训和测试子集分开,以便公平评估Locountal算法的性能,为Locount任务制定新的基准。此外,我们将一个级本地化和计算网络作为强有力的基准,逐步分类和重新复制了标定在零售商店中带有预测数字的成套数据集框,这些数据集包含在绑定框中,140万个对象。我们提供新的评估协议,在将来进行数据分析时将展示方向。