We propose a novel and pragmatic framework for traffic scene perception with roadside cameras. The proposed framework covers a full-stack of roadside perception pipeline for infrastructure-assisted autonomous driving, including object detection, object localization, object tracking, and multi-camera information fusion. Unlike previous vision-based perception frameworks rely upon depth offset or 3D annotation at training, we adopt a modular decoupling design and introduce a landmark-based 3D localization method, where the detection and localization can be well decoupled so that the model can be easily trained based on only 2D annotations. The proposed framework applies to either optical or thermal cameras with pinhole or fish-eye lenses. Our framework is deployed at a two-lane roundabout located at Ellsworth Rd. and State St., Ann Arbor, MI, USA, providing 7x24 real-time traffic flow monitoring and high-precision vehicle trajectory extraction. The whole system runs efficiently on a low-power edge computing device with all-component end-to-end delay of less than 20ms.
翻译:我们建议了一个新的务实框架,用于路边摄像机的交通景点感知。拟议框架涵盖基础设施辅助自主驾驶的全套路边感知管道,包括物体探测、物体定位、物体跟踪和多镜头信息聚合。 与以往的基于愿景的感知框架不同,我们在培训时依赖深度抵消或3D注解,我们采用了模块拆分设计,并引入了基于里程碑的3D本地化方法,其中检测和本地化可以很好地拆分,这样模型可以很容易地以2D说明为基础进行训练。拟议框架适用于光学或热像头摄像机和针孔或鱼眼透镜。我们的框架部署在位于埃尔斯沃思Rd和St. St. St., Ann Arbor,美国MI,提供7x24实时交通流量监测和高精度车辆轨迹提取。整个系统高效运行低功率的低功率边缘计算装置,所有组件端端端至端的延迟时间不到20米。