通过加权模拟距离快速搜索二进制代码 (Fast Search on Binary Codes by Weighted Hamming Distance)

Weighted Hamming distance, as a similarity measure between binary codes and binary queries, provides superior accuracy in search tasks than Hamming distance. However, how to efficiently and accurately find $K$ binary codes that have the smallest weighted Hamming distance to the query remains an open issue. In this paper, a fast search algorithm is proposed to perform the non-exhaustive search for $K$ nearest binary codes by weighted Hamming distance. By using binary codes as direct bucket indices in a hash table, the search algorithm generates a sequence to probe the buckets based on the independence characteristic of the weights for each bit. Furthermore, a fast search framework based on the proposed search algorithm is designed to solve the problem of long binary codes. Specifically, long binary codes are split into substrings and multiple hash tables are built on them. Then, the search algorithm probes the buckets to obtain candidates according to the generated substring indices, and a merging algorithm is proposed to find the nearest binary codes by merging the candidates. Theoretical analysis and experimental results demonstrate that the search algorithm improves the search accuracy compared to other non-exhaustive algorithms and provides orders-of-magnitude faster search than the linear scan baseline.

翻译：作为二进制码和二进制查询之间的类似度度,加权咸明距离在搜索任务中提供了比Hamming距离更高的精度。然而,如何高效率和准确地找到与查询距离最小加权的哈明距离的K$二进制码仍然是一个尚未解决的问题。在本文中,建议采用快速搜索算法,通过加权咸明距离对最接近的二进制码进行非详尽搜索。通过在散列表格中使用二进制码作为直接桶指数,搜索算法产生一个序列,以根据每个位重量的独立特性来探测桶。此外,基于拟议搜索算法的快速搜索框架旨在解决长二进制码的问题。具体地说,长二进制码分为子字符串,并在上面建立多个散列表。然后,搜索算法根据生成的子字符串指数对桶进行探测,并提议合并算法,以通过合并候选人来找到最近的二进制码。理论分析和实验结果表明,搜索算法比其他不完全性基线的搜索算法提高了搜索精确度。