Indoor 3D object detection is an essential task in single image scene understanding, impacting spatial cognition fundamentally in visual reasoning. Existing works on 3D object detection from a single image either pursue this goal through independent predictions of each object or implicitly reason over all possible objects, failing to harness relational geometric information between objects. To address this problem, we propose a dynamic sparse graph pipeline named Explicit3D based on object geometry and semantics features. Taking the efficiency into consideration, we further define a relatedness score and design a novel dynamic pruning algorithm followed by a cluster sampling method for sparse scene graph generation and updating. Furthermore, our Explicit3D introduces homogeneous matrices and defines new relative loss and corner loss to model the spatial difference between target pairs explicitly. Instead of using ground-truth labels as direct supervision, our relative and corner loss are derived from the homogeneous transformation, which renders the model to learn the geometric consistency between objects. The experimental results on the SUN RGB-D dataset demonstrate that our Explicit3D achieves better performance balance than the-state-of-the-art.
翻译:室内 3D 对象探测是单个图像场景理解中的一项基本任务,在视觉推理中从根本上影响空间认知。从单一图像中检测 3D 对象的现有工作要么通过独立预测每个对象来追求这一目标,要么对所有可能对象隐含理性,未能利用天体之间的关联几何信息。为了解决这个问题,我们提议根据天体的几何和语义特征,使用名为“Explication3D”的动态稀疏图形管道。考虑到效率,我们进一步定义了相关性分数,并设计了一种新的动态运行算法,随后为稀有场景图的生成和更新采用了集取样方法。此外,我们的Explic3D 引入了同质矩阵,并定义了新的相对损失和转角损失,以模拟目标对等之间的空间差异。我们不是使用地面图标签作为直接监督,而是从同质转换中得出我们的相对和转角损失,从而使模型能够了解天体之间的几何一致性。SUN RGB-D 数据集的实验结果表明,我们的扩展3D 实现比艺术状态更好的性平衡。