Deep Learning-based object detectors can enhance the capabilities of smart camera systems in a wide spectrum of machine vision applications including video surveillance, autonomous driving, robots and drones, smart factory, and health monitoring. Pedestrian detection plays a key role in all these applications and deep learning can be used to construct accurate state-of-the-art detectors. However, such complex paradigms do not scale easily and are not traditionally implemented in resource-constrained smart cameras for on-device processing which offers significant advantages in situations when real-time monitoring and robustness are vital. Efficient neural networks can not only enable mobile applications and on-device experiences but can also be a key enabler of privacy and security allowing a user to gain the benefits of neural networks without needing to send their data to the server to be evaluated. This work addresses the challenge of achieving a good trade-off between accuracy and speed for efficient deployment of deep-learning-based pedestrian detection in smart camera applications. A computationally efficient architecture is introduced based on separable convolutions and proposes integrating dense connections across layers and multi-scale feature fusion to improve representational capacity while decreasing the number of parameters and operations. In particular, the contributions of this work are the following: 1) An efficient backbone combining multi-scale feature operations, 2) a more elaborate loss function for improved localization, 3) an anchor-less approach for detection, The proposed approach called YOLOpeds is evaluated using the PETS2009 surveillance dataset on 320x320 images. Overall, YOLOpeds provides real-time sustained operation of over 30 frames per second with detection rates in the range of 86% outperforming existing deep learning models.
翻译:深层学习天体探测器可以提高智能摄像系统在包括视频监控、自主驾驶、机器人和无人机、智能工厂和健康监测在内的各种机器视觉应用中的能力,包括视频监控、自主驾驶、机器人和无人机、智能工厂和健康监测。在所有这些应用中,光学探测起着关键作用,而深层学习可以用来建立准确的先进探测器。然而,这种复杂的范例并不容易地扩大规模,而且传统上在智能相机应用中,在资源限制的智能智能相机中实施,这在实时监测和稳健至关重要的情况下具有重大优势。高效的神经网络不仅能够促进移动应用程序和智能体验,还可以成为隐私和安全的关键推进器,使用户能够在所有这些应用中获取神经网络的惠益,而无需将其数据发送到服务器来进行评价。然而,这种复杂的模式在智能相机应用中实现精深学习的行人探测的精度和速度之间的良好交替。基于可分解的全局性神经网络网络网络网络不仅能够促进移动应用程序的移动应用,而且还能够使现有深度和多层特征融合,从而提高代表性能力,同时减少实时网络的实时网络操作的精度。