Autonomous driving requires the model to perceive the environment and (re)act within a low latency for safety. While past works ignore the inevitable changes in the environment after processing, streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception. In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem. We build a simple and effective framework for streaming perception. It equips a novel DualFlow Perception module (DFP), which includes dynamic and static flows to capture the moving trend and basic detection feature for streaming prediction. Further, we introduce a Trend-Aware Loss (TAL) combined with a trend factor to generate adaptive weights for objects with different moving speeds. Our simple method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline, validating its effectiveness. Our code will be made available at https://github.com/yancie-yjr/StreamYOLO.
翻译:自动驾驶要求模型在低纬度的低纬度内感知环境和(再)反应安全。虽然过去的工作忽视了处理后不可避免的环境变化,但建议对流感进行联合评价,将潜值和准确度纳入一个单一的在线视频感知指标。在本文中,我们不象以前的工作那样在精确度和速度之间寻找取舍,而是指出,具有预测未来能力的实时模型是解决这一问题的关键。我们为流化感构建了一个简单有效的框架。它装备了一个全新的双光谱感知模块(DFP),其中包括动态和静态流,以捕捉流动趋势,以及流化预测的基本检测特征。此外,我们引入了Trend-Aware Loss(TAL),加上一个趋势系数,以生成移动速度不同的物体的适应权重。我们的简单方法在Argoververs-HD数据集上取得了竞争性的性表现,并将AP比强基线提高4.9%,并验证其有效性。我们的代码将在 https://github.com/yancie-yor/Stream-YOL/Stream 中提供。