Introducing semantically meaningful objects to visual Simultaneous Localization And Mapping (SLAM) has the potential to improve both the accuracy and reliability of pose estimates, especially in challenging scenarios with significant view-point and appearance changes. However, how semantic objects should be represented for an efficient inclusion in optimization-based SLAM frameworks is still an open question. Superquadrics(SQs) are an efficient and compact object representation, able to represent most common object types to a high degree, and typically retrieved from 3D point-cloud data. However, accurate 3D point-cloud data might not be available in all applications. Recent advancements in machine learning enabled robust object recognition and semantic mask measurements from camera images under many different appearance conditions. We propose a pipeline to leverage such semantic mask measurements to fit SQ parameters to multi-view camera observations using a multi-stage initialization and optimization procedure. We demonstrate the system's ability to retrieve randomly generated SQ parameters from multi-view mask observations in preliminary simulation experiments and evaluate different initialization stages and cost functions.
翻译:将具有语义意义的物体引入视觉同步本地化和绘图(SLAM),有可能提高表面估计的准确性和可靠性,特别是在具有重大视觉和外观变化的富有挑战性的假设情景中,但语义物体如何代表有效纳入基于优化的 SLAM 框架仍是一个未决问题。超级夸度(SQ)是一个高效和紧凑的物体代表,能够高程度代表最常见的物体类型,通常从3D点球数据中检索。然而,在所有应用中可能无法提供准确的 3D 点球数据。机器学习的最近进展使得能够在许多不同外观条件下从相机图像中强有力地识别物体和语义遮罩测量。我们提议建立一个管道,利用多阶段初始化和优化程序将SQ参数用于多视图相机观测。我们展示了系统在初步模拟实验中从多视角观测中随机生成的SQ参数的能力,并评估不同的初始阶段和成本功能。