Learning the hash representation of multi-view heterogeneous data is an important task in multimedia retrieval. However, existing methods fail to effectively fuse the multi-view features and utilize the metric information provided by the dissimilar samples, leading to limited retrieval precision. Current methods utilize weighted sum or concatenation to fuse the multi-view features. We argue that these fusion methods cannot capture the interaction among different views. Furthermore, these methods ignored the information provided by the dissimilar samples. We propose a novel deep metric multi-view hashing (DMMVH) method to address the mentioned problems. Extensive empirical evidence is presented to show that gate-based fusion is better than typical methods. We introduce deep metric learning to the multi-view hashing problems, which can utilize metric information of dissimilar samples. On the MIR-Flickr25K, MS COCO, and NUS-WIDE, our method outperforms the current state-of-the-art methods by a large margin (up to 15.28 mean Average Precision (mAP) improvement).
翻译:学习多视图异构数据的哈希表示是多媒体检索中的一个重要任务。然而,现有方法无法有效地融合多视图特征并利用不同样本提供的度量信息,导致检索精度有限。目前的方法利用加权和或串联来融合多视图特征。我们认为这些融合方法不能捕捉不同视图之间的相互作用。此外,这些方法忽略了不同样本提供的信息。我们提出了一种新颖的深度度量多视图哈希(DMMVH)方法来解决上述问题。广泛的实证证据表明,门控融合优于典型方法。我们将深度度量学习引入多视图哈希问题中,可以利用不同样本的度量信息。在MIR-Flickr25K、MS COCO和NUS-WIDE上,我们的方法比现有的最先进方法大幅提高了平均精度(高达15.28个平均精度提高)。