Robust real-time detection and motion forecasting of traffic participants is necessary for autonomous vehicles to safely navigate urban environments. In this paper, we present RV-FuseNet, a novel end-to-end approach for joint detection and trajectory estimation directly from time-series LiDAR data. Instead of the widely used bird's eye view (BEV) representation, we utilize the native range view (RV) representation of LiDAR data. The RV preserves the full resolution of the sensor by avoiding the voxelization used in the BEV. Furthermore, RV can be processed efficiently due to its compactness. Previous approaches project time-series data to a common viewpoint for temporal fusion, and often this viewpoint is different from where it was captured. This is sufficient for BEV methods, but for RV methods, this can lead to loss of information and data distortion which has an adverse impact on performance. To address this challenge we propose a simple yet effective novel architecture, \textit{Incremental Fusion}, that minimizes the information loss by sequentially projecting each RV sweep into the viewpoint of the next sweep in time. We show that our approach significantly improves motion forecasting performance over the existing state-of-the-art. Furthermore, we demonstrate that our sequential fusion approach is superior to alternative RV based fusion methods on multiple datasets.
翻译:对交通参与者进行强力实时探测和运动预测对于自主车辆安全航行城市环境来说是必要的。 在本文中,我们介绍了RV-FuseNet,这是直接从时间序列LIDAR数据进行联合探测和轨迹估计的一种新型端对端方法。我们使用LIDAR数据的本地范围视图(RV)代表方式,而不是广泛使用的鸟眼视图(BEV)代表方式,而是使用LIDAR数据的本地范围视图(RV)代表方式。RV通过避免BEV中使用的蒸气化来保持传感器的完整分辨率。此外,RV由于其紧凑性,可以高效地处理。以前的方法是项目时间序列数据,以共同的观点来进行时间聚合,而且这种观点往往与所捕捉的数据不同。这对BEV方法而言足够,但对于RDAR数据方法而言,这可能导致信息和数据扭曲的丢失,从而对业绩产生不利影响。为了应对这一挑战,我们提议了一个简单而有效的新结构,\ textit{Increcition Fision},通过按顺序对每次RV进行选择,最大限度地减少信息损失,方法是将每次RV的每个RV扫描到下一个周期的视角,我们以连续预测方式显示。我们目前的连续的进度方法表明我们的业绩表现。