We propose a novel scene flow estimation approach to capture and infer 3D motions from point clouds. Estimating 3D motions for point clouds is challenging, since a point cloud is unordered and its density is significantly non-uniform. Such unstructured data poses difficulties in matching corresponding points between point clouds, leading to inaccurate flow estimation. We propose a novel architecture named Sparse Convolution-Transformer Network (SCTN) that equips the sparse convolution with the transformer. Specifically, by leveraging the sparse convolution, SCTN transfers irregular point cloud into locally consistent flow features for estimating continuous and consistent motions within an object/local object part. We further propose to explicitly learn point relations using a point transformer module, different from exiting methods. We show that the learned relation-based contextual information is rich and helpful for matching corresponding points, benefiting scene flow estimation. In addition, a novel loss function is proposed to adaptively encourage flow consistency according to feature similarity. Extensive experiments demonstrate that our proposed approach achieves a new state of the art in scene flow estimation. Our approach achieves an error of 0.038 and 0.037 (EPE3D) on FlyingThings3D and KITTI Scene Flow respectively, which significantly outperforms previous methods by large margins.
翻译:我们提出一种新的场景流估计方法,从点云中捕捉和推断3D运动。估计点云的3D运动具有挑战性,因为点云没有顺序,其密度明显不统一。这种非结构化的数据在匹配点云之间相应的点点上造成了困难,导致流量估计不准确。我们提议了一个名为Sparse Convolution-Transfent Network(SCTN)的新结构,使稀散的变压与变压器相匹配。具体来说,通过利用稀疏的变相,SCTN将非常规点云传输到本地一致的流特征,以估计物体/当地物体部分的连续和一致运动。我们进一步提议使用不同于现有方法的点变压器模块明确学习点关系。我们表明,基于关系而获得的背景资料丰富,有助于匹配相应的点,从而有利于对场流进行估计。此外,我们提议的新的损失函数将适应性地鼓励流动与变压相相。广泛的实验表明,我们提出的方法在现场流量估算中取得了新的状态。我们的方法取得了0.038和0.037(EPE3D)的误差差,分别在飞行流差上。