Man-made scenes can be densely packed, containing numerous objects, often identical, positioned in close proximity. We show that precise object detection in such scenes remains a challenging frontier even for state-of-the-art object detectors. We propose a novel, deep-learning based method for precise object detection, designed for such challenging settings. Our contributions include: (1) A layer for estimating the Jaccard index as a detection quality score; (2) a novel EM merging unit, which uses our quality scores to resolve detection overlap ambiguities; finally, (3) an extensive, annotated data set, SKU-110K, representing packed retail environments, released for training and testing under such extreme settings. Detection tests on SKU-110K and counting tests on the CARPK and PUCPR+ show our method to outperform existing state-of-the-art with substantial margins. The code and data will be made available on \url{www.github.com/eg4000/SKU110K_CVPR19}.
翻译:人造场景可以密布,包含许多往往相同、位置相近的物体。我们显示,即使在最先进的物体探测器中,在这些场景中,精确物体的探测仍然是一个具有挑战性的前沿。我们提议了一种针对这种具有挑战性的环境的基于深层学习的新型精确物体探测方法。我们的贡献包括:(1) 用于估计雅克卡指数作为探测质量分数的一层层;(2) 一个新的EM合并单位,它利用我们的质量分数解决探测重叠问题;最后,(3) 一个广泛的、附加说明的数据集,SKU-110K, 代表了在这种极端环境下进行训练和试验的包装零售环境。SKU-110K和CARPPK和PUCPR+的检测测试和计数测试显示我们以显著的边距超越现有最新水平的方法。代码和数据将在\url{www.github.com/eg4000/SKU110K_CVPR19}上提供。