多任务联合培训中有效适应统一自主驾驶 (Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving)

Aiming towards a holistic understanding of multiple downstream tasks simultaneously, there is a need for extracting features with better transferability. Though many latest self-supervised pre-training methods have achieved impressive performance on various vision tasks under the prevailing pretrain-finetune paradigm, their generalization capacity to multi-task learning scenarios is yet to be explored. In this paper, we extensively investigate the transfer performance of various types of self-supervised methods, e.g., MoCo and SimCLR, on three downstream tasks, including semantic segmentation, drivable area segmentation, and traffic object detection, on the large-scale driving dataset BDD100K. We surprisingly find that their performances are sub-optimal or even lag far behind the single-task baseline, which may be due to the distinctions of training objectives and architectural design lied in the pretrain-finetune paradigm. To overcome this dilemma as well as avoid redesigning the resource-intensive pre-training stage, we propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training, where the off-the-shelf pretrained models can be effectively adapted without increasing the training overhead. During the adapt stage, we utilize learnable multi-scale adapters to dynamically adjust the pretrained model weights supervised by multi-task objectives while leaving the pretrained knowledge untouched. Furthermore, we regard the vision-language pre-training model CLIP as a strong complement to the pretrain-adapt-finetune paradigm and propose a novel adapter named LV-Adapter, which incorporates language priors in the multi-task model via task-specific prompting and alignment between visual and textual features.

翻译：虽然许多最新的自我监督培训前方法在大规模驱动数据集 BDD100K 上取得了令人印象深刻的业绩,但是我们却发现,这些方法的性能是次优的,甚至远远落后于单一任务基线,这可能是由于培训目标和建筑设计在前任务学习范式上的区别所致。为了克服这一困境,也为了避免重新设计资源密集型式培训前阶段,我们提议在大规模驱动数据集 BDD100K 上建立一个简单而有效的模拟语言偏差和交通对象检测模式,在常规驱动数据集BDDT100K 上,它们的业绩是次优的,甚至远远落后于单一任务基线。在本文中,可能由于培训目标和建筑设计设计在前任务中被忽略了。为了克服这一困境,并避免重新设计资源密集型培训前阶段,我们提议在普通多任务模式培训中采用简单而有效的模拟前调整模式,在升级前阶段,在升级前阶段,在升级前阶段,在升级前的高级任务前阶段,在升级前阶段,在升级前阶段,在升级前阶段,在升级前阶段,在升级前阶段,在升级前阶段,在升级之前,在升级前,在升级后,在升级后,在升级后,在升级后,在升级后,在升级前的多任务前,进行。