For many, Graphics Processing Units (GPUs) provides a source of reliable computing power. Recently, Nvidia introduced its 9th generation HPC-grade GPUs, the Ampere 100, claiming significant performance improvements over previous generations, particularly for AI-workloads, as well as introducing new architectural features such as asynchronous data movement. But how well does the A100 perform on non-AI benchmarks, and can we expect the A100 to deliver the application improvements we have grown used to with previous GPU generations? In this paper, we benchmark the A100 GPU and compare it to four previous generations of GPUs, with particular focus on empirically quantifying our derived performance expectations, and -- should those expectations be undelivered -- investigate whether the introduced data-movement features can offset any eventual loss in performance? We find that the A100 delivers less performance increase than previous generations for the well-known Rodinia benchmark suite; we show that some of these performance anomalies can be remedied through clever use of the new data-movement features, which we microbenchmark and demonstrate where (and more importantly, how) they should be used.
翻译:对于许多人来说,图形处理单位(GPUs)提供了可靠的计算能力来源。 最近,Nvidia引入了第九代HPC级GPUs,即Ampere 100,声称在前几代人中业绩显著提高,特别是AI-work负荷,以及引入新的建筑特征,例如类似同步的数据移动。但是,A100在非AI基准上的表现如何?我们能否期待A100提供我们与前几代GPU人一起形成的应用改进?在本文中,我们将A100GPU作为基准,并将其与前四代GPU进行对比,特别侧重于实证性地量化我们获得的绩效预期,以及 -- -- 如果这些期望没有得到实现 -- -- 调查引入的数据移动特征是否能抵消任何最终的绩效损失?我们发现,对于众所周知的Rodinia基准套件来说,A100比前几代的绩效提高得少;我们表明,通过明智地使用新的数据移动特征可以纠正其中的一些性异常现象,我们微缩记并展示(更重要的是)应在何处使用这些特征。