Autonomous driving gained huge traction in recent years, due to its potential to change the way we commute. Much effort has been put into trying to estimate the state of a vehicle. Meanwhile, learning to forecast the state of a vehicle ahead introduces new capabilities, such as predicting dangerous situations. Moreover, forecasting brings new supervision opportunities by learning to predict richer a context, expressed by multiple horizons. Intuitively, a video stream originated from a front-facing camera is necessary because it encodes information about the upcoming road. Besides, historical traces of the vehicle's states give more context. In this paper, we tackle multi-horizon forecasting of vehicle states by fusing the two modalities. We design and experiment with 3 end-to-end architectures that exploit 3D convolutions for visual features extraction and 1D convolutions for features extraction from speed and steering angle traces. To demonstrate the effectiveness of our method, we perform extensive experiments on two publicly available real-world datasets, Comma2k19 and the Udacity challenge. We show that we are able to forecast a vehicle's state to various horizons, while outperforming the current state-of-the-art results on the related task of driving state estimation. We examine the contribution of vision features, and find that a model fed with vision features achieves an error that is 56.6% and 66.9% of the error of a model that doesn't use those features, on the Udacity and Comma2k19 datasets respectively.
翻译:近年来,自主驾驶获得了巨大的牵引力,原因是它有可能改变我们行驶的方式。在试图估算车辆状况方面已经付出了很大努力。与此同时,学会预测车辆前方状况也带来了新的能力,例如预测危险情况。此外,预测带来了新的监督机会,学会预测更丰富的环境,用多种地平线来表示。直观地平线的视频流之所以必要,是因为它能编码关于即将到来的公路的信息。此外,该飞行器国家的历史痕迹提供了更多的背景。在本文中,我们通过两种方式来应对车辆状态的多视距预报。我们设计并试验了3D进程图案以提取视觉特征和1进程图案以从多角度显示更丰富的环境。为了显示我们的方法的有效性,我们不得不对两个公开存在的真实世界模型数据集(Coma2k19)和Udacity挑战进行广泛的实验。我们能够预测车辆状态的多视距值预测,用两种方式对车辆状态进行预测。9.9 使用两种方式。我们设计并试验了3D进程图案的3进程图案和1进程图案,从速度和方向分析了我们当前定位的定位的状态,我们找到了了一种状态。