In recent years, binary code learning, a.k.a hashing, has received extensive attention in large-scale multimedia retrieval. It aims to encode high-dimensional data points to binary codes, hence the original high-dimensional metric space can be efficiently approximated via Hamming space. However, most existing hashing methods adopted offline batch learning, which is not suitable to handle incremental datasets with streaming data or new instances. In contrast, the robustness of the existing online hashing remains as an open problem, while the embedding of supervised/semantic information hardly boosts the performance of the online hashing, mainly due to the defect of unknown category numbers in supervised learning. In this paper, we proposed an online hashing scheme, termed Hadamard Codebook based Online Hashing (HCOH), which aims to solve the above problems towards robust and supervised online hashing. In particular, we first assign an appropriate high-dimensional binary codes to each class label, which is generated randomly by Hadamard codes to each class label, which is generated randomly by Hadamard codes. Subsequently, LSH is adopted to reduce the length of such Hadamard codes in accordance with the hash bits, which can adapt the predefined binary codes online, and theoretically guarantee the semantic similarity. Finally, we consider the setting of stochastic data acquisition, which facilitates our method to efficiently learn the corresponding hashing functions via stochastic gradient descend (SGD) online. Notably, the proposed HCOH can be embedded with supervised labels and it not limited to a predefined category number. Extensive experiments on three widely-used benchmarks demonstrate the merits of the proposed scheme over the state-of-the-art methods. The code is available at https://github.com/lmbxmu/mycode/tree/master/2018ACMMM_HCOH.
翻译:近几年来,二进制代码学习( a.k.a hashing) 在大规模多媒体检索中得到了广泛的关注。 它的目的是将高维数据点编码为二进制代码, 因此最初的高维度计量空间可以通过Hamming 空间有效近似。 然而, 大部分现有的散装方法都采用离线分解学习, 这不适合用流数据或新实例来处理递增数据集。 相比之下, 现有的在线散列的稳健性仍然是一个开放的问题, 而嵌入监管/ semanth 信息很难提升在线散列的性能, 主要是因为在受监管的学习中未知的类别数字有缺陷。 在此文件中, 我们提议了一个名为 Hadammard 代码的在线散列方法, 目的是解决以上问题, 以流数据或新事件为主, 特别是, 由Hadmard- comm 代码随机生成的每类代号的多维度, 由Hadmard- codealal- deliversal dational- deal commodeal dreal decal lax, 最后, LSadHHHal dreal demode to decal decal decreal decregistration to lating the sliversal decal decal deal destrational demod.