We study the Closest Pair Problem in Hamming metric, which asks to find the pair with the smallest Hamming distance in a collection of binary vectors. We give a new randomized algorithm for the problem on uniformly random input outperforming previous approaches whenever the dimension of input points is small compared to the dataset size. For moderate to large dimensions, our algorithm matches the time complexity of the previously best-known locality sensitive hashing based algorithms. Technically our algorithm follows similar design principles as Dubiner (IEEE Trans. Inf. Theory 2010) and May-Ozerov (Eurocrypt 2015). Besides improving the time complexity in the aforementioned areas, we significantly simplify the analysis of these previous works. We give a modular analysis, which allows us to investigate the performance of the algorithm also on non-uniform input distributions. Furthermore, we give a proof of concept implementation of our algorithm which performs well in comparison to a quadratic search baseline. This is the first step towards answering an open question raised by May and Ozerov regarding the practicability of algorithms following these design principles.
翻译:我们研究了哈明标准中最接近的对称问题,该标准要求找到在二进制矢量的集合中存在最小的哈明距离的对应方。 我们给出了一个新的随机算法,在单一随机输入的问题上,当输入点的尺寸与数据集大小相比小时,该算法就优于先前最著名的地点敏感散射算法的时间复杂性时,我们的算法与中、大维相匹配。 从技术上讲,我们的算法遵循与Dubiner(IEEE Trans. Inf. Theory. 2010)和May-Ozeerov(Europt 2015)类似的设计原则。除了提高上述领域的时间复杂性外,我们还大大简化了对先前这些工程的分析。我们给出了模块分析,使我们能够调查算法在非统一输入分布方面的性。此外,我们提供了我们算法与四进制搜索基线相比运行良好的概念执行证据。这是回答5月和Ozerov就遵循这些设计原则的算法的可行性提出的一个开放问题的第一步。