FPGAs have become emerging computing infrastructures for accelerating applications in datacenters. Meanwhile, high-level synthesis (HLS) tools have been proposed to ease the programming of FPGAs. Even with HLS, irregular data-intensive applications require explicit optimizations, among which multiple processing elements (PEs) with each owning a private BRAM-based buffer are usually adopted to process multiple data per cycle. Data routing, which dynamically dispatches multiple data to designated PEs, avoids data replication in buffers compared to statically assigning data to PEs, hence saving BRAM usage. However, the workload imbalance among PEs vastly diminishes performance when processing skew datasets. In this paper, we propose a skew-oblivious data routing architecture that allocates secondary PEs and schedules them to share the workload of the overloaded PEs at run-time. In addition, we integrate the proposed architecture into a framework called Ditto to minimize the development efforts for applications that require skew handling. We evaluate Ditto on five commonly used applications: histogram building, data partitioning, pagerank, heavy hitter detection and hyperloglog. The results demonstrate that the generated implementations are robust to skew datasets and outperform the stateof-the-art designs in both throughput and BRAM usage efficiency.
翻译:FPGAs已成为加速数据中心应用的新兴计算基础设施。与此同时,为了便利FPGAs的编程,提出了高水平合成工具(HLS),以方便FPGAs的编程。即使对HLS来说,数据密集的非常规应用程序也需要明确的优化,其中每个拥有私人 BRAM 缓冲的多个处理元件(PE)通常被采用来处理每个周期的多个数据。数据路由,向指定的 PE 动态发送多个数据,避免在缓冲中复制数据,与静态向 PEs 分配数据相比,从而节省BRAM 的使用。然而,PEs之间的工作量不平衡极大地降低了处理 skew数据集的性能。在本文件中,我们提议建立一个可忽略的、可辨的数据路径结构,用于分配二级PEERM(PE),并安排它们按照运行时间分担超负荷的 PEEE的工作量。此外,我们将拟议结构整合到一个称为Ditto的框架,以最大限度地减少需要Skew处理应用程序的开发努力。我们评估了五个常用应用程序的Dittoto:在Stographet bukemabukema 和Simstststst Stracudy Straction Straft Stal Strapeal Straft ractions