We propose YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7 FPS on an RTX 2080 Ti) with a ResNet-101 backbone on 550x550 resolution images. To achieve this, we make two improvements to the state-of-the-art image-based real-time method YOLACT: (1) applying TensorRT optimization while carefully trading off speed and accuracy, and (2) a novel feature warping module to exploit temporal redundancy in videos. Experiments on the YouTube VIS and MS COCO datasets demonstrate that YolactEdge produces a 3-5x speed up over existing real-time methods while producing competitive mask and box detection accuracy. We also conduct ablation studies to dissect our design choices and modules. Code and models are available at https://github.com/haotian-liu/yolact_edge.
翻译:我们提出YolactEdge,这是在小型边缘装置上以实时速度运行的第一个竞争性图像分割法。具体地说,YolactEdge在杰特森AgX XX Xavier(和172.7FPS,在RTX 2080 Ti)上运行多达30.8FPS,在550x550分辨率图像上使用ResNet-101主干线。为了实现这一点,我们改进了最先进的基于图像的实时方法YOLACT:(1) 应用TensorRT优化,同时谨慎地交换速度和准确性;(2) 开发视频中的时间冗余的新特征扭曲模块。YouTubeVIS和MS COCOCO数据集实验显示,YolactEdge在实时方法上生成了3-5x的速度,同时产生了竞争性的遮罩和盒探测精度。我们还开展了一个模拟研究,以解析我们的设计选择和模块。代码和模型可在https://github.com/haitian-liu/yolact_edge上查阅。