Classical visual simultaneous localization and mapping (SLAM) algorithms usually assume the environment to be rigid. This assumption limits the applicability of those algorithms as they are unable to accurately estimate the camera poses and world structure in real life scenes containing moving objects (e.g. cars, bikes, pedestrians, etc.). To tackle this issue, we propose TwistSLAM: a semantic, dynamic and stereo SLAM system that can track dynamic objects in the environment. Our algorithm creates clusters of points according to their semantic class. Thanks to the definition of inter-cluster constraints modeled by mechanical joints (function of the semantic class), a novel constrained bundle adjustment is then able to jointly estimate both poses and velocities of moving objects along with the classical world structure and camera trajectory. We evaluate our approach on several sequences from the public KITTI dataset and demonstrate quantitatively that it improves camera and object tracking compared to state-of-the-art approaches.
翻译:为了解决这一问题,我们建议TwistSLAM:一个能够跟踪环境中动态物体的语义、动态和立体的SLAM系统。我们的算法根据语义类创造了几组点。由于这些算法以机械联合(语义类功能)为模型的组间限制定义,因此,新的限制捆绑调整能够结合传统世界结构和摄像轨迹,共同估计移动物体的形形形和速度。我们从公众的KITTI数据集中评估我们的一些序列,并从数量上表明它改进了相机和物体的跟踪,与最先进的方法相比。