Natural language (NL) based vehicle retrieval aims to search specific vehicle given text description. Different from the image-based vehicle retrieval, NL-based vehicle retrieval requires considering not only vehicle appearance, but also surrounding environment and temporal relations. In this paper, we propose a Symmetric Network with Spatial Relationship Modeling (SSM) method for NL-based vehicle retrieval. Specifically, we design a symmetric network to learn the unified cross-modal representations between text descriptions and vehicle images, where vehicle appearance details and vehicle trajectory global information are preserved. Besides, to make better use of location information, we propose a spatial relationship modeling methods to take surrounding environment and mutual relationship between vehicles into consideration. The qualitative and quantitative experiments verify the effectiveness of the proposed method. We achieve 43.92% MRR accuracy on the test set of the 6th AI City Challenge on natural language-based vehicle retrieval track, yielding the 1st place among all valid submissions on the public leaderboard. The code is available at https://github.com/hbchen121/AICITY2022_Track2_SSM.
翻译:与基于图像的车辆检索不同的是,基于NL的车辆检索不仅需要考虑车辆外观,而且还需要考虑周围环境和时间关系。在本文件中,我们提议为基于NL的车辆检索建立一个具有空间关系建模(SSM)方法的对称网络。具体地说,我们设计了一个对称网络,以学习文本描述和车辆图像之间的统一交叉模式,在其中保留车辆外观细节和车辆轨迹全球信息。此外,为了更好地利用定位信息,我们提出了空间关系建模方法,以考虑到周围的环境和车辆之间的相互关系。定性和定量实验检验了拟议方法的有效性。我们实现了基于自然语言的车辆检索轨迹的第六次AI城市挑战测试集的43.92%的MAR精确度,在公共车头板上的所有有效提交材料中均占第一位。该代码可在https://github.com/hbchen121/AICITIT2022_Trak2_SSM上查阅。