Deploying deep learning models on embedded systems has been challenging due to limited computing resources. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and therefore, require specialized convolutions to aggregate spatial information. To address this need, recent work introduces dynamic deformable convolution to augment regular convolutions. However, this will lead to inefficient memory accesses of inputs with existing hardware. In this work, we harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions. We show the speed-accuracy tradeoffs for a set of algorithm modifications including irregular-access versus limited-range and fixed-shape. We then Co-Design a Network CoDeNet with the modified deformable convolution and quantize it to 4-bit weights and 8-bit activations. With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB while achieving 61.7 AP50 on the standard object detection dataset, Pascal VOC. With our higher accuracy implementation, our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.
翻译:由于计算资源有限,在嵌入系统中部署深层学习模型一直具有挑战性。 大部分现有工作侧重于加速图像分类,而其他基本视觉问题,如物体探测等,没有得到充分解决。 与图像分类相比,探测问题对天体的空间差异更加敏感,因此,需要专门演化以汇总空间信息。 为解决这一需要,最近的工作引入动态变形变异变异,以扩大常规变异。 然而,这将导致对现有硬件投入的内存访问效率低下。 在这项工作中,我们利用FPGAs的灵活性,开发一个具有变形性变形功能的新型物体探测管道。 我们展示了一套算法修改(包括不定期访问相对于有限和固定形状)的速度准确交易。 然后我们共同设计了一个网络 CoDeNet,并修改了变形变异变变变,使网络变异变变变变的重量和振动为4比重和8比活化。 随着我们高效率的实施,我们的解决方案每秒达到26.9个框架,小的模型为0.76 MB,同时在标准物体探测模型上达到61.7- AP- 50/50 的Pasal精确度模型,我们只有10 AS- AS- Pascar Vascar Vascar V 的精确度为6- sal 10 的精确度为6- 。