The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14$\times$ compression rate of YOLOv4 with 49.0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5$\times$ speedup. Source code is at: \url{https://github.com/nightsnack/YOLObile}.
翻译:物体探测技术的迅速发展和广泛使用引起了人们对物体探测器准确性和速度的关注,然而,目前最先进的物体探测工作要么使用大型模型,以精确为导向,要么使用大型模型,但导致使用轻重量模型的高悬浮率或速度为导向,但牺牲了准确性。在这项工作中,我们提议YOLObile框架,即通过压缩-复合共同设计,在移动设备上实时探测物体;为任何内核大小提议了一个新型的块式插管剪裁计划。为提高移动设备的计算效率,在先进的编译器辅助优化的同时,采用了GPU-CPU协作计划。实验结果表明,我们的裁剪切计划实现了14美元/美元YOLOv4和49.0 mAP。在我们YOLObile框架下,我们用三星银河S20的GPU,实现了17个FPS的感测速度。通过纳入我们提议的GPU-CPU协作计划,推导速度提高到19.1 FPS,并超越了原始的YOLOV4/CO值。5:YGU/comtime/O=Speople=O=SDRyal=Oxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx