Moving object segmentation is a crucial task for autonomous vehicles as it can be used to segment objects in a class agnostic manner based on their motion cues. It enables the detection of unseen objects during training (e.g., moose or a construction truck) based on their motion and independent of their appearance. Although pixel-wise motion segmentation has been studied in autonomous driving literature, it has been rarely addressed at the instance level, which would help separate connected segments of moving objects leading to better trajectory planning. As the main issue is the lack of large public datasets, we create a new InstanceMotSeg dataset comprising of 12.9K samples improving upon our KITTIMoSeg dataset. In addition to providing instance level annotations, we have added 4 additional classes which is crucial for studying class agnostic motion segmentation. We adapt YOLACT and implement a motion-based class agnostic instance segmentation model which would act as a baseline for the dataset. We also extend it to an efficient multi-task model which additionally provides semantic instance segmentation sharing the encoder. The model then learns separate prototype coefficients within the class agnostic and semantic heads providing two independent paths of object detection for redundant safety. To obtain real-time performance, we study different efficient encoders and obtain 39 fps on a Titan Xp GPU using MobileNetV2 with an improvement of 10% mAP relative to the baseline. Our model improves the previous state of the art motion segmentation method by 3.3%. The dataset and qualitative results video are shared in our website at https://sites.google.com/view/instancemotseg/.
翻译:移动对象分割是自动车辆的一项关键任务, 因为它可以用来根据运动提示以类中不可知的方式分割物体。 它可以在训练期间根据运动和外观来探测看不见的物体( 如 moose 或 建筑卡车 ) 。 虽然在自主驱动文献中已经研究了像素一样的动动动分割部分, 但是它很少在例一级得到处理, 这有助于分离移动对象的连接部分, 从而导致更好的轨迹规划。 由于主要问题是缺少大型公共数据集, 我们创造了一个新的 CentralMotSeg 数据集, 其中包括12.9K 样本, 改进了我们KITTIMoSeg数据集。 除了提供实例级别说明外, 我们还增加了4个对研究类中运动运动分解至关重要的类别。 我们调整了YOLACT, 并实施了一个基于运动的分类的突变分解模式, 它将起到模型的基线作用。 我们还将它推广到一个高效的多塔克模型, 它将提供与我们相共享的语义区段部分, 分享我们 KITOal- real real- dia real lialal laction laction laction laction laction lavel.