Learning-based perception and prediction modules in modern autonomous driving systems typically rely on expensive human annotation and are designed to perceive only a handful of predefined object categories. This closed-set paradigm is insufficient for the safety-critical autonomous driving task, where the autonomous vehicle needs to process arbitrarily many types of traffic participants and their motion behaviors in a highly dynamic world. To address this difficulty, this paper pioneers a novel and challenging direction, i.e., training perception and prediction models to understand open-set moving objects, with no human supervision. Our proposed framework uses self-learned flow to trigger an automated meta labeling pipeline to achieve automatic supervision. 3D detection experiments on the Waymo Open Dataset show that our method significantly outperforms classical unsupervised approaches and is even competitive to the counterpart with supervised scene flow. We further show that our approach generates highly promising results in open-set 3D detection and trajectory prediction, confirming its potential in closing the safety gap of fully supervised systems.
翻译:现代自主驾驶系统中的基于学习的认知和预测模块通常依赖昂贵的人类批注,其设计仅能识别少数预定的物体类别。这种封闭式模式不足以执行安全关键自主驾驶任务,即自主车辆需要在一个高度动态的世界中任意处理许多类型的交通参与者及其运动行为。为解决这一困难,本文开创了一个新的和富有挑战性的方向,即培训认知和预测模型,以了解开放式移动物体,而没有人类监督。我们提议的框架利用自学流来触发自动元标签管道,以实现自动监督。Waymo Open Dataset的3D探测实验表明,我们的方法大大超越了古典的不受监督的方法,甚至与受监督的场景流动相对应者具有竞争力。我们进一步表明,我们的方法在开放式3D探测和轨迹预测方面产生了非常有希望的结果,证实了它有可能弥合完全受监督的系统的安全差距。