Dynamic environments such as urban areas are still challenging for popular visual-inertial odometry (VIO) algorithms. Existing datasets typically fail to capture the dynamic nature of these environments, therefore making it difficult to quantitatively evaluate the robustness of existing VIO methods. To address this issue, we propose three contributions: firstly, we provide the VIODE benchmark, a novel dataset recorded from a simulated UAV that navigates in challenging dynamic environments. The unique feature of the VIODE dataset is the systematic introduction of moving objects into the scenes. It includes three environments, each of which is available in four dynamic levels that progressively add moving objects. The dataset contains synchronized stereo images and IMU data, as well as ground-truth trajectories and instance segmentation masks. Secondly, we compare state-of-the-art VIO algorithms on the VIODE dataset and show that they display substantial performance degradation in highly dynamic scenes. Thirdly, we propose a simple extension for visual localization algorithms that relies on semantic information. Our results show that scene semantics are an effective way to mitigate the adverse effects of dynamic objects on VIO algorithms. Finally, we make the VIODE dataset publicly available at https://github.com/kminoda/VIODE.
翻译:都市地区等动态环境对于流行的视觉-内皮odology (VIO) 算法来说仍然具有挑战性。 现有的数据集通常不能捕捉到这些环境的动态性质, 因而难以从数量上评估现有的VIO方法的稳健性。 为了解决这个问题, 我们提出三项贡献: 首先, 我们提供VIODE基准, 这是从模拟的UAV中记录的新数据集, 在充满挑战性的动态环境中航行。 VIOD数据集的独特特征是系统地将对象引入场景中。 它包括三个环境, 每个环境都有四个动态级别, 以逐渐增加移动对象。 数据集包含同步的立体图像和IMU数据, 以及地面的轨迹和实例分割面。 第二, 我们在VIOD数据集上比较VIO( VIO) 的状态, 并显示它们在高度动态环境中表现出巨大的性能退化。 第三, 我们建议简单扩展视觉本地化算法, 依靠语义信息。 我们的结果显示, 现场语义是减少动态物体在VIOD/ VIO 上公开的数据效果的有效方法。