利用PHAST到港口咖啡厅图书馆:初步经验教训 (Using PHAST to port Caffe library: First experiences and lessons learned)

Performance has always been a hot topic in computing. However, the viable ways to achieve it have taken many forms in the different moments of computing history. Today, technological limits have pushed the adoption of increasingly parallel multi-core and many-core architectures and even the use of highly specific hardware (aka Domain-Specific Architectures, or DSAs) to solve very specific problems. In this new context, one major problem is how to develop software once, and be able to run it on multiple accelerator architectures, seamlessly. Ideally aiming at a single programming model that can automatically target the code to different kinds of parallel architectures, allowing specific tuning with minimal, if any, changes to the source-code in order to seek performance portability. A comprehensive solution to this is still lacking. In this work, we present the use of the PHAST Library, which allows users to code once, at a high level of abstraction and thus with high productivity, and automatically targeting different parallel devices by changing the compilation process. As a case study, we have worked on the porting of the well-known deep-learning Caffe framework. The framework has been split into different parts and some of them have been ported, obtaining a working straightforward implementation that can be run on both CPUs and GPUs. We conclude discussing the lessons learned during the porting process, and analyzing the obtained performance in the perspective of completing the porting and expanding it to future consequent works.

翻译：然而,在计算历史的不同时刻,实现这一功能的可行方法一直是一个热点主题。然而,在计算历史的不同时刻,实现这一功能的可行方法已经采取了许多形式。今天,技术限制促使人们采用日益平行的多核心和多核心结构,甚至使用非常具体的硬件(如多内容特定建筑,或DSA)来解决非常具体的问题。在这一新的背景下,一个主要问题是如何开发软件一次,并能够在多个加速器结构上顺利运行。理想的情况是,建立一个单一的编程模式,可以自动将代码针对不同类型的平行结构,允许对源代码进行微小(如果有的话)的修改,以寻求可移植性。这方面仍然缺乏一个全面的解决办法。在这项工作中,我们介绍了使用PHAST图书馆的情况,使用户能够一次性地、高度抽象地和高生产率地对软件进行编码,并通过改变汇编过程自动地针对不同的平行装置。作为案例研究,我们一直在努力将众所周知的深层次学习的卡夫框架移植成型,允许对源码码进行特定的调整,以便寻求可移植的源码的修改。在港口执行过程中,我们一直在讨论一个直接地讨论港口执行过程,然后再讨论。在港口进行。在港口进行。在港口上,在港口上,在港口上进行一个直接地讨论。在港口上进行工作,在港口上进行工作,可以讨论,在港口上进行工作,在逐步地讨论。