OMG: 观察基于自然语言的车辆检索多颗粒 (OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval)

Retrieving tracked-vehicles by natural language descriptions plays a critical role in smart city construction. It aims to find the best match for the given texts from a set of tracked vehicles in surveillance videos. Existing works generally solve it by a dual-stream framework, which consists of a text encoder, a visual encoder and a cross-modal loss function. Although some progress has been made, they failed to fully exploit the information at various levels of granularity. To tackle this issue, we propose a novel framework for the natural language-based vehicle retrieval task, OMG, which Observes Multiple Granularities with respect to visual representation, textual representation and objective functions. For the visual representation, target features, context features and motion features are encoded separately. For the textual representation, one global embedding, three local embeddings and a color-type prompt embedding are extracted to represent various granularities of semantic features. Finally, the overall framework is optimized by a cross-modal multi-granularity contrastive loss function. Experiments demonstrate the effectiveness of our method. Our OMG significantly outperforms all previous methods and ranks the 9th on the 6th AI City Challenge Track2. The codes are available at https://github.com/dyhBUPT/OMG.

翻译：通过自然语言描述检索履带车辆在智能城市建设中发挥着关键作用,目的是在监控视频中找到一组跟踪车辆所提供文本的最佳匹配文本。现有工作一般通过双流框架解决,由文字编码器、视觉编码器和跨模式损失功能组成。虽然取得了一些进展,但它们未能充分利用不同层次的颗粒特征的信息。为了解决这一问题,我们提议了一个基于自然语言的车辆检索任务的新框架,即OMG,在视觉表述、文字表述和客观功能方面观测多种颗粒。视觉表述、目标特征、上下文特征和运动特征则分别编码。对于文字表述、一个全球嵌入、三个地方嵌入和一个颜色型快速嵌入,以代表各种语系特征的颗粒特性。最后,我们提议了一个基于自然语言的车辆检索任务的新框架,即OMG,在视觉表述、文字表述和客观功能方面观测到多种颗粒质。关于视觉表述、目标特征、环境特征和运动特征的图示,将单独编码。对于文本、目标表达、一个全球嵌入、三个地方嵌和颜色型快速嵌入,以代表各种语系特征。最后,整个框架通过跨式多式多语系的多语系对比损失功能进行优化。实验,以展示我们的方法展示方法的有效性。我们的方法。我们OMGMGMGMGOGOD在前的轨道/Greastrs

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日