Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics. While FPGAs denote a promising solution through flexible memory hierarchies and massive parallelism, we argue that current graph processing accelerators either use the off-chip memory bandwidth inefficiently or do not scale well across memory channels. In this work, we propose GraphScale, a scalable graph processing framework for FPGAs. For the first time, GraphScale combines multi-channel memory with asynchronous graph processing (i.e., for fast convergence on results) and a compressed graph representation (i.e., for efficient usage of memory bandwidth and reduced memory footprint). GraphScale solves common graph problems like breadth-first search, PageRank, and weakly-connected components through modular user-defined functions, a novel two-dimensional partitioning scheme, and a high-performance two-level crossbar design.
翻译:在FPGAs图解处理方面最近取得的进展,有望缓解因记忆存取模式不规则而出现的业绩瓶颈。这种瓶颈挑战了越来越多的重要应用领域,如机器学习和数据分析领域的绩效。虽然FPGAs表示通过灵活的记忆等级和大规模平行性来找到一个有希望的解决办法,但我们认为,目前的图形处理加速器要么使用离芯内存带宽效率低,要么在记忆频道之间缩小范围不高。在这项工作中,我们提议为FPGAs提供一个可缩放的图解处理框架。首次,“图形表”将多通道内存与无同步的图形处理(即,结果快速趋同)和压缩图形代表(即,有效使用记忆带宽度和记忆足迹)相结合。“图形表”通过模块用户定义的功能、新型的二维分解方案和高性双层交叉栏设计解决常见的图形问题,如宽一搜索、PageRank和连接度不强的组件。