基于能效FPGA设备的精确最近邻搜索 (Exact Nearest-Neighbor Search on Energy-Efficient FPGA Devices)

This paper investigates the usage of FPGA devices for energy-efficient exact kNN search in high-dimension latent spaces. This work intercepts a relevant trend that tries to support the increasing popularity of learned representations based on neural encoder models by making their large-scale adoption greener and more inclusive. The paper proposes two different energy-efficient solutions adopting the same FPGA low-level configuration. The first solution maximizes system throughput by processing the queries of a batch in parallel over a streamed dataset not fitting into the FPGA memory. The second minimizes latency by processing each kNN incoming query in parallel over an in-memory dataset. Reproducible experiments on publicly available image and text datasets show that our solution outperforms state-of-the-art CPU-based competitors regarding throughput, latency, and energy consumption. Specifically, experiments show that the proposed FPGA solutions achieve the best throughput in terms of queries per second and the best-observed latency with scale-up factors of up to 16.6X. Similar considerations can be made regarding energy efficiency, where results show that our solutions can achieve up to 11.9X energy saving w.r.t. strong CPU-based competitors.

翻译：本文研究了在高维潜在空间中使用FPGA设备进行能效精确k近邻搜索的方法。这项工作顺应了一个重要趋势：通过使基于神经编码器模型的学习表示的大规模应用更加绿色和包容，以支持其日益增长的需求。本文提出了两种不同的能效解决方案，均采用相同的FPGA底层配置。第一种方案通过在不适合FPGA内存的流式数据集上并行处理批量查询来最大化系统吞吐量。第二种方案通过在内存数据集上并行处理每个k近邻传入查询来最小化延迟。在公开可用的图像和文本数据集上进行的可复现实验表明，我们的解决方案在吞吐量、延迟和能耗方面均优于最先进的基于CPU的竞品。具体而言，实验显示所提出的FPGA解决方案实现了每秒查询数的最佳吞吐量，以及最高达16.6倍的扩展因子下的最佳观测延迟。在能效方面也可得出类似结论，结果显示我们的解决方案相较于强大的基于CPU竞品最高可实现11.9倍的节能效果。

相关内容

FPGA

关注 18

FPGA：ACM/SIGDA International Symposium on Field-Programmable Gate Arrays。 Explanation：ACM/SIGDA现场可编程门阵列国际研讨会。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/fpga/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日