Transformer architectures have exhibited promising performance in various autonomous driving applications in recent years. On the other hand, its dedicated hardware acceleration on portable computational platforms has become the next critical step for practical deployment in real autonomous vehicles. This survey paper provides a comprehensive overview, benchmark, and analysis of Transformer-based models specifically tailored for autonomous driving tasks such as lane detection, segmentation, tracking, planning, and decision-making. We review different architectures for organizing Transformer inputs and outputs, such as encoder-decoder and encoder-only structures, and explore their respective advantages and disadvantages. Furthermore, we discuss Transformer-related operators and their hardware acceleration schemes in depth, taking into account key factors such as quantization and runtime. We specifically illustrate the operator level comparison between layers from convolutional neural network, Swin-Transformer, and Transformer with 4D encoder. The paper also highlights the challenges, trends, and current insights in Transformer-based models, addressing their hardware deployment and acceleration issues within the context of long-term autonomous driving applications.
翻译:Transformer架构在近年来各种自动驾驶应用中展现出了良好的表现。而其在便携式计算平台上的专业硬件加速已成为实际部署到真实自动驾驶汽车中的下一步关键步骤。本综述论文提供了一份针对自动驾驶任务的Transformer基础模型的全面概述、基准测试和分析,例如车道检测、分割、跟踪、规划和决策。我们回顾了不同的Transformer输入和输出组织结构,例如编码器-解码器和仅编码器结构,并探讨了它们各自的优点和缺点。此外,我们深入讨论了与Transformer相关的操作符及其硬件加速方案,考虑到关键因素,例如量化和运行时间。我们特别说明了来自卷积神经网络、Swin-Transformer和4D编码器的层之间的操作符级比较。本文还重点介绍了Transformer基础模型中的挑战、趋势和当前见解,涉及长期自动驾驶应用中的硬件部署和加速问题。