Multi-task learning has emerged as a powerful paradigm to solve a range of tasks simultaneously with good efficiency in both computation resources and inference time. However, these algorithms are designed for different tasks mostly not within the scope of autonomous driving, thus making it hard to compare multi-task methods in autonomous driving. Aiming to enable the comprehensive evaluation of present multi-task learning methods in autonomous driving, we extensively investigate the performance of popular multi-task methods on the large-scale driving dataset, which covers four common perception tasks, i.e., object detection, semantic segmentation, drivable area segmentation, and lane detection. We provide an in-depth analysis of current multi-task learning methods under different common settings and find out that the existing methods make progress but there is still a large performance gap compared with single-task baselines. To alleviate this dilemma in autonomous driving, we present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting to guide the model toward learning high-quality task-specific representations. Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories and further mitigate the performance gap. Furthermore, we bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving. Comprehensive experimental results on the diverse self-driving dataset BDD100K show that the VE-Prompt improves the multi-task baseline and further surpasses single-task models.
翻译:多任务学习已成为一种强有力的范例,可以同时解决一系列任务,同时在计算资源和推算时间方面效率良好。然而,这些算法是为不同任务设计的,大多不在自主驾驶范围内,因此很难比较自主驾驶的多任务方法。为了能够全面评估目前自主驾驶的多任务学习方法,我们广泛调查大规模驾驶数据集上流行的多任务方法的绩效,该数据集包括四种共同的认知任务,即:物体探测、语义分解、可流区域分解和航道探测。我们为不同通用环境下的当前多任务学习方法提供了多深入分析,发现现有方法取得进展,但与单一任务基线相比,业绩差距仍然很大。为了缓解自主驾驶中的这一困境,我们提出了一个有效的多任务框架,即VE-Prompt,通过具体任务检测、超额数据分解,指导模型进一步学习高质量的基准演示。具体地说,我们制作了100个直观的预算模型,并用直观的直观前置模型和直观模型来减轻了我们直观的直观变的图像和直观数据标记。</s>