As urban environments manifest high levels of complexity it is of vital importance that safety systems embedded within autonomous vehicles (AVs) are able to accurately anticipate short-term future motion of nearby agents. This problem can be further understood as generating a sequence of coordinates describing the future motion of the tracked agent. Various proposed approaches demonstrate significant benefits of using a rasterised top-down image of the road, with a combination of Convolutional Neural Networks (CNNs), for extraction of relevant features that define the road structure (eg. driveable areas, lanes, walkways). In contrast, this paper explores use of Capsule Networks (CapsNets) in the context of learning a hierarchical representation of sparse semantic layers corresponding to small regions of the High-Definition (HD) map. Each region of the map is dismantled into separate geometrical layers that are extracted with respect to the agent's current position. By using an architecture based on CapsNets the model is able to retain hierarchical relationships between detected features within images whilst also preventing loss of spatial data often caused by the pooling operation. We train and evaluate our model on publicly available dataset nuTonomy scenes and compare it to recently published methods. We show that our model achieves significant improvement over recently published works on deterministic prediction, whilst drastically reducing the overall size of the network.
翻译:由于城市环境的复杂程度较高,因此,在自主车辆(AVs)内嵌入的安全系统能够准确预测附近物剂的短期未来运动至关重要。这个问题可以进一步理解为产生一系列坐标序列,描述跟踪物剂的未来运动。各种拟议办法表明,使用分层的自上而下的道路图像,结合进化神经网络(CNNs),为提取确定道路结构的相关特征(如可驾驶区、通道、行道等),有很大益处。与此形成对照的是,本文件探讨利用Capsule网络(CapsNets)来学习与高定义地图小区域相对应的稀薄的语义层的等级代表。地图的每个区域被拆成单独的几何级图象层,与该物剂目前的位置相分离。利用基于CapsNet(CapsNet)的架构,该模型能够保持所探测到的图像内特征之间的等级关系,同时防止由于联合操作而经常导致的空间数据丢失。我们用我们所公布的模型来训练和评价我们所公布的模型,以最近所公布的大幅改进的图像的模型来比较我们最近所出版的模型。