掩码焦点损失:用卡通物体探测网络进行密集人群计数的统一框架 (Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks)

As a fundamental computer vision task, crowd counting predicts the number of pedestrians in a scene, which plays an important role in risk perception and early warning, traffic control and scene statistical analysis. Currently, deep learning based head detection is a promising method for crowd counting. However, the highly concerned object detection networks cannot be well applied to this field for three reasons: (1) The sample imbalance has not been overcome yet in highly dense and complex scenes because the existing loss functions calculate the positive loss at a single key point or in the entire target area with the same weight for all pixels; (2) The canonical object detectors' loss calculation is a hard assignment without taking into account the space coherence from the object location to the background region; and (3) Most of the existing head detection datasets are only annotated with the center points instead of bounding boxes which is mandatory for the canonical detectors. To address these problems, we propose a novel loss function, called Mask Focal Loss (MFL), to redefine the loss contributions according to the situ value of the heatmap with a Gaussian kernel. MFL provides a unifying framework for the loss functions based on both heatmap and binary feature map ground truths. Meanwhile, for better evaluation and comparison, a new synthetic dataset GTA\_Head is built, including 35 sequences, 5096 images and 1732043 head labels with bounding boxes. Experimental results show the overwhelming performance and demonstrate that our proposed MFL framework is applicable to all of the canonical detectors and to various datasets with different annotation patterns. This work provides a strong baseline for surpassing the crowd counting methods based on density estimation.

翻译：作为基本的计算机愿景任务,人群计数预测了现场行人的人数,这在风险感知和早期警报、交通控制和现场统计分析方面起着重要作用。目前,深学习制的头部检测是一种有希望的人群计数方法。然而,高度关切的物体检测网络无法很好地应用于这个领域,原因有三:(1) 抽样失衡尚未在高度密集和复杂的场景中得到克服,因为现有的损失函数在单一关键点或整个目标区域以所有像素的同等重量计算正损失;(2) 弹道天体探测器损失计算是一项艰巨的任务,没有考虑到从物体密度位置到背景区域的空间一致性;(3) 现有头部检测数据集大多只是用中心点加注,而不是为卡通性探测器所必须的捆绑框。为了解决这些问题,我们提议了一个全新的损失函数,称为“掩码焦点损失”(MFMFLL),以便根据高斯骨架的原值重新定义损失贡献。 MFLL为基于可应用的离心仪位置定位定位定位定位定位和模型模型的17损失功能提供了更好的统一框架, 包括可应用的Gmaxal Stal Storal Stal Stal 工作的模型, mastral mastreval madeal max max lax a max sal sal sal sal sal sal sal sal sal sal sal mas sal mas sal mas sal mas sal mas sal mas sal mas mas sal sal sal sal sal sal lades mas mas sal sal sald sald sald sal masald sal saldaldal masaldald saldaldal masal masal sal sal madal mas sal masal masal ladal madaldaldal madal madal mad sal sal sal sal madal madal mas sal masal masal sal masal masal masaldaldal masal masal mas