Autonomous vehicles operate in highly dynamic environments necessitating an accurate assessment of which aspects of a scene are moving and where they are moving to. A popular approach to 3D motion estimation, termed scene flow, is to employ 3D point cloud data from consecutive LiDAR scans, although such approaches have been limited by the small size of real-world, annotated LiDAR data. In this work, we introduce a new large-scale dataset for scene flow estimation derived from corresponding tracked 3D objects, which is $\sim$1,000$\times$ larger than previous real-world datasets in terms of the number of annotated frames. We demonstrate how previous works were bounded based on the amount of real LiDAR data available, suggesting that larger datasets are required to achieve state-of-the-art predictive performance. Furthermore, we show how previous heuristics for operating on point clouds such as down-sampling heavily degrade performance, motivating a new class of models that are tractable on the full point cloud. To address this issue, we introduce the FastFlow3D architecture which provides real time inference on the full point cloud. Additionally, we design human-interpretable metrics that better capture real world aspects by accounting for ego-motion and providing breakdowns per object type. We hope that this dataset may provide new opportunities for developing real world scene flow systems.
翻译:自动飞行器在高度动态的环境中运作,需要准确评估一个场景的哪些方面正在移动,以及它们正在向哪个方向移动。对三维运动估计的流行方法,即场景流,是使用连续的利达雷达扫描的三维点云数据,尽管这些方法受到现实世界规模小的限制,但受到附加说明的利达雷达数据的限制。在这项工作中,我们引入了一个新的大型的场景流量估计数据集,该数据集来自相应的跟踪3D对象,该数据集比以前的真实世界数据集大11,000美元。从注释框架的数量来看,我们展示了先前的工程是如何根据实际的利达雷达数据量来捆绑起来的,这表明需要更大的数据集才能实现最新水平的预测性性能。此外,我们展示了以前在点云上运行的超大型数据,例如从底部取样的性能严重退化,鼓励在全点云上可吸引的新型模型。为了解决这一问题,我们引入了快速的Flow3D结构,该结构提供了真实的时差结构,为完全的利达尔数据量,我们用全世界的模型设计了更好的世界模型模型模型模型。