Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.
翻译:虽然以深造为基础的单眼行人探测方法取得了很大进展,但它们仍然容易受到严重隔离的影响。使用多视图信息融合是一个潜在的解决方案,但由于现有多视图数据集缺乏附加说明的培训样本,其应用有限,这增加了过分匹配的风险。为解决这一问题,建议采用数据增强方法随机在地面平面上生成3D气瓶隔离,即行人平均尺寸和多视图,以减轻训练中过度装配的影响。此外,每种视图的特写图将投向不同高度的多个平行平面,使用同质图,使CNN能够充分利用每个行人高度的特征,以推断地面平面行人的位置。与最先进的多视图行人探测深学习方法相比,拟议的3DROM方法的性能大大提高。