Lane detection, the process of identifying lane markings as approximated curves, is widely used for lane departure warning and adaptive cruise control in autonomous vehicles. The popular pipeline that solves it in two steps -- feature extraction plus post-processing, while useful, is too inefficient and flawed in learning the global context and lanes' long and thin structures. To tackle these issues, we propose an end-to-end method that directly outputs parameters of a lane shape model, using a network built with a transformer to learn richer structures and context. The lane shape model is formulated based on road structures and camera pose, providing physical interpretation for parameters of network output. The transformer models non-local interactions with a self-attention mechanism to capture slender structures and global context. The proposed method is validated on the TuSimple benchmark and shows state-of-the-art accuracy with the most lightweight model size and fastest speed. Additionally, our method shows excellent adaptability to a challenging self-collected lane detection dataset, showing its powerful deployment potential in real applications. Codes are available at https://github.com/liuruijin17/LSTR.
翻译:为解决这些问题,我们建议了一种端对端方法,即用一个变压器建立的网络直接输出一个车道形状模型的参数,以学习更丰富的结构和背景; 车道形状模型是根据道路结构和照相机姿势制定的,为网络输出参数提供物理判读; 变压器模型与一个自留机制的非本地互动,以捕捉滑体结构和全球背景; 拟议的方法在图斯坦基准上得到验证,并显示最轻的模型大小和速度的最新精确度; 此外,我们的方法显示极佳的适应性,可使用具有挑战性的自行收集的车道探测数据集,在实际应用中显示其强大的部署潜力。