The k-nearest neighbor search is used in various applications such as machine learning, computer vision, database search, and information retrieval. While the computational cost of the exact nearest neighbor search is enormous, an approximate nearest neighbor search (ANNS) has been attracting much attention. IVFPQ is one of the ANNS methods. Although we can leverage the high bandwidth and low latency of shared memory to compute the search phase of the IVFPQ on NVIDIA GPUs, the throughput can degrade due to shared memory bank conflict. To reduce the bank conflict and improve the search throughput, we propose a custom 8-bit floating point value format. This format doesn't have a sign bit and can be converted from/to FP32 with a few instructions. We use this format for IVFPQ on GPUs and achieved better performance without significant recall loss compared to FP32 and FP16.
翻译:k 最近的邻居搜索被用于机器学习、计算机视觉、数据库搜索和信息检索等各种应用程序。 虽然近邻搜索的计算成本非常巨大, 但近邻搜索( ANNS) 的计算成本非常高, 吸引了人们的注意。 IVFPQ 是 ANNS 方法之一 。 尽管我们可以利用高带宽和低延迟的共享内存来计算 NVIDIA GPUs 上的IVFPQ 搜索阶段, 但是由于共享的内存银行冲突, 吞吐量可以降解。 为了减少银行冲突, 改进搜索量, 我们建议了一种定制的 8- 位浮动点值格式。 这个格式没有符号, 并且可以转换为 FP32, 并有几处指示 。 我们用这个格式来计算 GPUs 上的 IVFPQ, 并在不与 FP32 和 FP16 相比出现重大回扣损失的情况下实现更好的业绩 。