设计、建模和优化数据密集计算系统 (Designing, Modeling, and Optimizing Data-Intensive Computing Systems)

The cost of moving data between the memory units and the compute units is a major contributor to the execution time and energy consumption of modern workloads in computing systems. At the same time, we are witnessing an enormous amount of data being generated across multiple application domains. These trends suggest a need for a paradigm shift towards a data-centric approach where computation is performed close to where the data resides. Further, a data-centric approach can enable a data-driven view where we take advantage of vast amounts of available data to improve architectural decisions. As a step towards modern architectures, this dissertation contributes to various aspects of the data-centric approach and proposes several data-driven mechanisms. First, we design NERO, a data-centric accelerator for a real-world weather prediction application. Second, we explore the applicability of different number formats, including fixed-point, floating-point, and posit, for different stencil kernels. Third, we propose NAPEL, an ML-based application performance and energy prediction framework for data-centric architectures. Fourth, we present LEAPER, the first use of few-shot learning to transfer FPGA-based computing models across different hardware platforms and applications. Fifth, we propose Sibyl, the first reinforcement learning-based mechanism for data placement in hybrid storage systems. Overall, this thesis provides two key conclusions: (1) hardware acceleration on an FPGA+HBM fabric is a promising solution to overcome the data movement bottleneck of our current computing systems; (2) data should drive system and design decisions by leveraging inherent data characteristics to make our computing systems more efficient.

翻译：内存单位和计算单位之间移动数据的成本是计算系统现代工作量执行时间和能源消耗的一个主要因素。同时,我们目睹了在多个应用领域产生的大量数据。这些趋势表明,需要将范式转向以数据为中心的方法,在数据所在地附近进行计算。此外,以数据为中心的方法可以使我们利用大量可用数据来改进建筑决策的数据驱动观点。作为迈向现代结构的一个步骤,这种分解有助于以数据为中心的方法的各个方面,并提出若干以数据为驱动的机制。首先,我们设计了以数据为中心的数据加速器,这是一个以数据为中心的加速器。第二,我们探索了不同数字格式的适用性,包括固定点、浮动点和假设。第三,我们为以数据为中心的结构提出了基于MLEL的应用和能源预测框架。第四,我们介绍了LEAPER,首次使用了以数据为中心的以数据为中心的加速加速器,我们首次使用了以数据为中心的数据加速器进行数据加速加速器,我们用数字加速器向不同的存储平台学习,我们先用数字加速系统,然后用数字加速键系统向以学习。