Predicting how the world can evolve in the future is crucial for motion planning in autonomous systems. Classical methods are limited because they rely on costly human annotations in the form of semantic class labels, bounding boxes, and tracks or HD maps of cities to plan their motion and thus are difficult to scale to large unlabeled datasets. One promising self-supervised task is 3D point cloud forecasting from unannotated LiDAR sequences. We show that this task requires algorithms to implicitly capture (1) sensor extrinsics (i.e., the egomotion of the autonomous vehicle), (2) sensor intrinsics (i.e., the sampling pattern specific to the particular LiDAR sensor), and (3) the shape and motion of other objects in the scene. But autonomous systems should make predictions about the world and not their sensors. To this end, we factor out (1) and (2) by recasting the task as one of spacetime (4D) occupancy forecasting. But because it is expensive to obtain ground-truth 4D occupancy, we render point cloud data from 4D occupancy predictions given sensor extrinsics and intrinsics, allowing one to train and test occupancy algorithms with unannotated LiDAR sequences. This also allows one to evaluate and compare point cloud forecasting algorithms across diverse datasets, sensors, and vehicles.
翻译:预测未来世界如何演变是自主系统运动规划的关键。古典方法有限,因为它们依赖于成本高昂的人类说明,其形式是语义类标签、捆绑框、轨道或城市的HD地图,以规划其运动,因此很难将其规模扩大为大型无标签数据集。一个有希望的自我监督的任务是从无注释的LIDAR序列中进行三维点云预报。我们显示,这项任务需要算法暗含地捕捉:(1)传感器极限(即自主飞行器的自我移动),(2)感官内含(即特定LIDAR传感器特有的取样模式),(3)现场其他物体的形状和运动。但是自主系统应该对世界而不是传感器作出预测。为此,我们把任务作为空间时(4D)占用预测之一重新播下。但是,由于获得地面四维占用(即自主飞行器的自我定位)、(2)感官内含的感官内含式(即特定LDAR传感器的取样模式),以及(3)现场其他物体的形状和动作。但是自主系统应该对世界而不是传感器作出预测。我们把任务作为因素,把任务重新投放出去,把任务作为空间时间(4D)预测。但是,我们把点从4D的云值数据从4D定位数据从传感器和内载预测变成一个运算,使得一个测试和运算算算。</s>