标题：加强上下文在目标检测中对区域-词对齐的作用摘要：视觉语言预训练可以学习图像说明对之间的细粒度区域-词对齐，从而推动开放词汇的目标检测的进步。我们观察到，区域-词对齐方法通常仅用于与对象名词有关的检测，而其他丰富的说明上下文，如属性的影响不清楚。在本研究中，我们探讨了语言上下文如何影响下游的目标检测，并提出了加强上下文的作用。特别是，我们展示了如何策略性地将所有预训练目标与其它相关的预训练目标相结合以获得更好的对齐效果。我们进一步聚焦于属性作为特别有用的对象上下文，并提出了一种新的基于形容词和名词的负采样策略，以增加其在对比学习中的重要性。总的来说，与区域-词汇预训练的现有技术相比，我们的方法可以改善目标检测。我们还通过文本-区域检索和短语定位分析，突出展示了属性敏感型模型的细粒度效用。 (Enhancing the Role of Context in Region-Word Alignment for Object Detection)

翻译：标题：加强上下文在目标检测中对区域-词对齐的作用摘要：视觉语言预训练可以学习图像说明对之间的细粒度区域-词对齐，从而推动开放词汇的目标检测的进步。我们观察到，区域-词对齐方法通常仅用于与对象名词有关的检测，而其他丰富的说明上下文，如属性的影响不清楚。在本研究中，我们探讨了语言上下文如何影响下游的目标检测，并提出了加强上下文的作用。特别是，我们展示了如何策略性地将所有预训练目标与其它相关的预训练目标相结合以获得更好的对齐效果。我们进一步聚焦于属性作为特别有用的对象上下文，并提出了一种新的基于形容词和名词的负采样策略，以增加其在对比学习中的重要性。总的来说，与区域-词汇预训练的现有技术相比，我们的方法可以改善目标检测。我们还通过文本-区域检索和短语定位分析，突出展示了属性敏感型模型的细粒度效用。

Kyle Buettner,Adriana Kovashka

Vision-language pretraining to learn a fine-grained, region-word alignment between image-caption pairs has propelled progress in open-vocabulary object detection. We observe that region-word alignment methods are typically used in detection with respect to only object nouns, and the impact of other rich context in captions, such as attributes, is unclear. In this study, we explore how language context affects downstream object detection and propose to enhance the role of context. In particular, we show how to strategically contextualize the grounding pretraining objective for improved alignment. We further hone in on attributes as especially useful object context and propose a novel adjective and noun-based negative sampling strategy for increasing their focus in contrastive learning. Overall, our methods enhance object detection when compared to the state-of-the-art in region-word pretraining. We also highlight the fine-grained utility of an attribute-sensitive model through text-region retrieval and phrase grounding analysis.

翻译：