We propose a novel crowd counting approach that leverages abundantly available unlabeled crowd imagery in a learning-to-rank framework. To induce a ranking of cropped images , we use the observation that any sub-image of a crowded scene image is guaranteed to contain the same number or fewer persons than the super-image. This allows us to address the problem of limited size of existing datasets for crowd counting. We collect two crowd scene datasets from Google using keyword searches and query-by-example image retrieval, respectively. We demonstrate how to efficiently learn from these unlabeled datasets by incorporating learning-to-rank in a multi-task network which simultaneously ranks images and estimates crowd density maps. Experiments on two of the most challenging crowd counting datasets show that our approach obtains state-of-the-art results.
翻译:我们建议采用新的人群计数方法,在学习到排位的框架内利用大量可用的未贴标签的人群图像。为了对作物图像进行排序,我们使用这样的观察,即拥挤的场景图像的任何子图像都保证包含与超级图像相同或更少的人数。这使我们能够解决现有人群计数数据集规模有限的问题。我们分别使用关键词搜索和逐个查询图像检索,从谷歌收集两个人群场景数据集。我们展示了如何通过将学习到排位纳入多任务网络,同时排位图像和估计人群密度地图来高效地从这些未贴标签的数据集中学习。对两个最具挑战的人群计数数据集的实验显示,我们的方法获得了最新的结果。