The bird's-eye-view (BEV) representation allows robust learning of multiple tasks for autonomous driving including road layout estimation and 3D object detection. However, contemporary methods for unified road layout estimation and 3D object detection rarely handle the class imbalance of the training dataset and multi-class learning to reduce the total number of networks required. To overcome these limitations, we propose a unified model for road layout estimation and 3D object detection inspired by the transformer architecture and the CycleGAN learning framework. The proposed model deals with the performance degradation due to the class imbalance of the dataset utilizing the focal loss and the proposed dual cycle loss. Moreover, we set up extensive learning scenarios to study the effect of multi-class learning for road layout estimation in various situations. To verify the effectiveness of the proposed model and the learning scheme, we conduct a thorough ablation study and a comparative study. The experiment results attest the effectiveness of our model; we achieve state-of-the-art performance in both the road layout estimation and 3D object detection tasks.
翻译:鸟眼观察(BEV)的表示方式使得能够强有力地了解自主驾驶的多重任务,包括道路布局估计和3D对象探测。然而,当前统一道路布局估计和3D对象探测方法很少处理培训数据集和多类学习之间的阶级不平衡,以减少所需的网络总数。为了克服这些限制,我们提出了一个统一的路面估计和3D对象探测模式,由变压器架构和循环GAN学习框架启发。拟议模式处理由于利用焦点损失和拟议的双周期损失数据集的阶级不平衡而造成的性能退化问题。此外,我们制定了广泛的学习方案,以研究多类学习在不同情况下对道路布局估计的影响。为核实拟议模型和学习计划的有效性,我们进行了彻底的减缩研究和比较研究。实验结果证明了我们的模型的有效性;我们在道路布局估计和3D对象探测任务方面都取得了最新业绩。