大数据工作量的特性和建筑影响 (Characterization and Architectural Implications of Big Data Workloads)

Big data areas are expanding in a fast way in terms of increasing workloads and runtime systems, and this situation imposes a serious challenge to workload characterization, which is the foundation of innovative system and architecture design. The previous major efforts on big data benchmarking either propose a comprehensive but a large amount of workloads, or only select a few workloads according to so-called popularity, which may lead to partial or even biased observations. In this paper, on the basis of a comprehensive big data benchmark suite---BigDataBench, we reduced 77 workloads to 17 representative workloads from a micro-architectural perspective. On a typical state-of-practice platform---Intel Xeon E5645, we compare the representative big data workloads with SPECINT, SPECCFP, PARSEC, CloudSuite and HPCC. After a comprehensive workload characterization, we have the following observations. First, the big data workloads are data movement dominated computing with more branch operations, taking up to 92% percentage in terms of instruction mix, which places them in a different class from Desktop (SPEC CPU2006), CMP (PARSEC), HPC (HPCC) workloads. Second, corroborating the previous work, Hadoop and Spark based big data workloads have higher front-end stalls. Comparing with the traditional workloads i. e. PARSEC, the big data workloads have larger instructions footprint. But we also note that, in addition to varied instruction-level parallelism, there are significant disparities of front-end efficiencies among different big data workloads. Third, we found complex software stacks that fail to use state-of-practise processors efficiently are one of the main factors leading to high front-end stalls. For the same workloads, the L1I cache miss rates have one order of magnitude differences among diverse implementations with different software stacks.

翻译：大数据领域在增加工作量和运行时间系统方面迅速扩大,这种情况对工作量定性构成严重挑战,而工作量定性是创新系统和架构设计的基础。以前关于大数据基准的重大努力要么提出全面但大量的工作量,要么只是根据所谓的受欢迎程度选择一些工作量,这可能导致部分或甚至有偏差的观察。在本文件中,根据一个全面的大数据基准套件-BigDataBench,我们从微观结构角度将77个工作量降为17个代表性工作量。关于典型的更高级做法平台-Intel Xeon E5645,我们将代表的大型数据工作量与SECINT、SPECF、PARSEC、CSloudSet和HPC进行对比。在全面的工作量定性之后,我们也有以下观察。首先,大数据工作量以更多的部门业务计算数据流动为主,从较复杂的软件组合中高达92%的指令,这使得它们位于不同的桌面(SPEC iPO) 、 IMF 前端数据(PAR) 和前端数据(HCR) 之前的数据流程中出现显著差异。

相关内容

HPCC

关注 0

HPCC：IEEE International Conference on High Performance Computing and Communications。 Explanation：IEEE高性能计算与通信国际会议。 Publisher：IEEE。 SIT： http://dblp.uni-trier.de/db/conf/hpcc/

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

124+阅读 · 2020年5月30日