V2F-Net: 隐蔽食虫检测的显性分解 (V2F-Net: Explicit Decomposition of Occluded Pedestrian Detection)

Occlusion is very challenging in pedestrian detection. In this paper, we propose a simple yet effective method named V2F-Net, which explicitly decomposes occluded pedestrian detection into visible region detection and full body estimation. V2F-Net consists of two sub-networks: Visible region Detection Network (VDN) and Full body Estimation Network (FEN). VDN tries to localize visible regions and FEN estimates full-body box on the basis of the visible box. Moreover, to further improve the estimation of full body, we propose a novel Embedding-based Part-aware Module (EPM). By supervising the visibility for each part, the network is encouraged to extract features with essential part information. We experimentally show the effectiveness of V2F-Net by conducting several experiments on two challenging datasets. V2F-Net achieves 5.85% AP gains on CrowdHuman and 2.24% MR-2 improvements on CityPersons compared to FPN baseline. Besides, the consistent gain on both one-stage and two-stage detector validates the generalizability of our method.

翻译：在行人探测方面,隐蔽性非常具有挑战性。在本文中,我们提出了一个简单而有效的方法,名为V2F-Net,明确将隐蔽的行人探测分解成可见区域探测和全身估计。V2F-Net由两个子网络组成:可见区域探测网(VDN)和全体估计网络。VDN试图根据可见框将可见区域本地化,FEN估计了全体框。此外,为了进一步改善对全体的估计,我们提议了一个新的基于嵌入式的部分觉悟模块(EPM)。通过监督每个部分的可见度,鼓励网络利用基本部分信息提取特征。我们实验性地展示了V2F-Net的有效性,对两个具有挑战性的数据集进行了几次实验。V2F-Net在Crowdhuman上取得了5.85%的AP收益,在CityPersons上比FPN基线改进了2.24% MR-2。此外,一个阶段和两阶段探测器都一致地取得了我们的方法的可概括性。