Crowd image is arguably one of the most laborious data to annotate. In this paper, we devote to reduce the massive demand of densely labeled crowd data, and propose a novel weakly-supervised setting, in which we leverage the binary ranking of two images with high-contrast crowd counts as training guidance. To enable training under this new setting, we convert the crowd count regression problem to a ranking potential prediction problem. In particular, we tailor a Siamese Ranking Network that predicts the potential scores of two images indicating the ordering of the counts. Hence, the ultimate goal is to assign appropriate potentials for all the crowd images to ensure their orderings obey the ranking labels. On the other hand, potentials reveal the relative crowd sizes but cannot yield an exact crowd count. We resolve this problem by introducing "anchors" during the inference stage. Concretely, anchors are a few images with count labels used for referencing the corresponding counts from potential scores by a simple linear mapping function. We conduct extensive experiments to study various combinations of supervision, and we show that the proposed method outperforms existing weakly-supervised methods without additional labeling effort by a large margin.
翻译:人群群列图像可以说是最难以描述的数据之一。 在本文中, 我们致力于减少对高标签人群数据的巨大需求, 并提出一个新的微弱监督环境, 藉此将两个高相聚人群数的二进制排名作为培训指导。 为了在这一新环境下开展培训, 我们将人群计数回归问题转换为可能的排序预测问题 。 特别是, 我们定制一个暹罗排名网络, 预测两个显示计数顺序的图像的潜在分数 。 因此, 我们最终的目标是为所有人群图像分配适当的潜能, 以确保他们服从排序标签 。 另一方面, 潜在显示相对人群数, 但无法产生准确的人群数数 。 我们通过在引力阶段引入“ 锚 ” 来解决这个问题。 具体地说, 锚是少数带有计数标签的图像, 用简单的线性绘图功能来匹配相应的计数 。 我们进行广泛的实验, 以研究各种监督组合, 并且我们显示, 提议的方法在没有额外标签的情况下, 超越了现有的较弱的测量幅度。