Streaming perception is a task of reporting the current state of autonomous driving, which coherently considers the latency and accuracy of autopilot systems. However, the existing streaming perception only uses the current and adjacent two frames as input for learning the movement patterns, which cannot model actual complex scenes, resulting in failed detection results. To solve this problem, we propose an end-to-end dual-path network dubbed LongShortNet, which captures long-term temporal motion and calibrates it with short-term spatial semantics for real-time perception. Moreover, we investigate a Long-Short Fusion Module (LSFM) to explore spatiotemporal feature fusion, which is the first work to extend long-term temporal in streaming perception. We evaluate the proposed LongShortNet and compare it with existing methods on the benchmark dataset Argoverse-HD. The results demonstrate that the proposed LongShortNet outperforms the other state-of-the-art methods with almost no extra computational cost.
翻译:流动感知是报告自主驾驶现状的一项任务,它连贯地考虑到自动驾驶系统的延迟性和准确性。然而,现有流动感知仅将当前和相邻的两个框架用作学习移动模式的投入,这些模式无法模拟实际复杂的场景,导致检测结果失败。为了解决这一问题,我们提议建立一个名为LongShortNet的端对端双向网络,以长期时间运动为主,用短期空间语义校准,以便实时感知。此外,我们调查一个长流流流式组合模块(LSFM),以探索空间时空特征聚合,这是在流动感中延长长期时间的第一个工作。我们评估了拟议的长肖特网络,并将其与基准数据集Argovers-HD的现有方法进行比较。结果显示,拟议的长肖特网络在几乎没有额外计算成本的情况下,超越了其他状态的计算方法。