Natural language-based vehicle retrieval is a task to find a target vehicle within a given image based on a natural language description as a query. This technology can be applied to various areas including police searching for a suspect vehicle. However, it is challenging due to the ambiguity of language descriptions and the difficulty of processing multi-modal data. To tackle this problem, we propose a deep neural network called SBNet that performs natural language-based segmentation for vehicle retrieval. We also propose two task-specific modules to improve performance: a substitution module that helps features from different domains to be embedded in the same space and a future prediction module that learns temporal information. SBnet has been trained using the CityFlow-NL dataset that contains 2,498 tracks of vehicles with three unique natural language descriptions each and tested 530 unique vehicle tracks and their corresponding query sets. SBNet achieved a significant improvement over the baseline in the natural language-based vehicle tracking track in the AI City Challenge 2021.
翻译:以自然语言为基础的车辆检索是一项任务,目的是根据作为查询的自然语言描述,在特定图像中找到目标车辆。这一技术可以应用于各个领域,包括警方搜索可疑车辆。然而,由于语言描述含糊不清,而且难以处理多式数据,这一技术具有挑战性。为解决这一问题,我们提议建立一个名为SBNet的深神经网络,为车辆检索提供基于自然语言的分解。我们还提议了两个具体任务模块,以改善性能:一个替代模块,帮助不同领域的特点嵌入同一空间,以及一个未来预测模块,学习时间信息。SBnet已经接受了培训,该数据集包含2 498条车辆轨迹,每条有3个独特的自然语言描述,测试了530个独特的车辆轨迹及其相应的查询数据集。SBNet大大改进了AI City挑战2021中基于语言的车辆追踪轨迹的基线。