Nowadays, we are living in an era of extreme device heterogeneity. Despite the high variety of conventional CPU architectures, accelerator devices, such as GPUs and FPGAs, also appear in the foreground exploding the pool of available solutions to execute applications. However, choosing the appropriate device per application needs is an extremely challenging task due to the abstract relationship between hardware and software. Automatic optimization algorithms that are accurate are required to cope with the complexity and variety of current hardware and software. Optimal execution has always relied on time-consuming trial and error approaches. Machine learning (ML) and Natural Language Processing (NLP) has flourished over the last decade with research focusing on deep architectures. In this context, the use of natural language processing techniques to source code in order to conduct autotuning tasks is an emerging field of study. In this paper, we extend the work of Cummins et al., namely Deeptune, that tackles the problem of optimal device selection (CPU or GPU) for accelerated OpenCL kernels. We identify three major limitations of Deeptune and, based on these, we propose four different DNN models that provide enhanced contextual information of source codes. Experimental results show that our proposed methodology surpasses that of Cummins et al. work, providing up to 4\% improvement in prediction accuracy.
翻译:目前,我们生活在一个极端装置不均的时代。尽管常规CPU结构非常繁多,但常规CPU结构、GPU和FPGAs等加速器装置也出现在地表前方,使执行应用程序的现有解决方案库爆炸。然而,由于硬件和软件之间的抽象关系,选择每个应用程序需要的适当装置是一项极具挑战性的任务。需要自动优化算法来应对当前硬件和软件的复杂性和多样性。最佳执行总是依靠耗时的试验和错误方法。机器学习(ML)和自然语言处理(NLP)在过去十年中蓬勃发展,研究的重点是深层结构。在这方面,使用自然语言处理技术来源代码以进行自动调控任务是一个新出现的研究领域。在本文件中,我们将Cummins et al.(即Deeptune)的工作推广到解决最佳装置选择(CPU或GPU)的问题,以加速 Op CLkernals。我们发现深层和自然语言处理(NP)的三大主要缺陷,是研究深层结构。在这方面,利用自然语言处理技术进行源代码的源代码,以进行自动校验算。我们提议了四种CNER的模型,以提供更精确的模型,以提供更精确的模型。