Over the past few years, self-attention is shining in the field of deep learning, especially in the domain of natural language processing(NLP). Its impressive effectiveness, along with ubiquitous implementations, have aroused our interest in efficiently scheduling the data-flow of corresponding computations onto architectures with many computing units to realize parallel computing. In this paper, based on the theory of self-attention mechanism and state-of-the-art realization of self-attention in language models, we propose a general scheduling algorithm, which is derived from the optimum scheduling for small instances solved by a satisfiability checking(SAT) solver, to parallelize typical computations of self-attention. Strategies for further optimization on skipping redundant computations are put forward as well, with which reductions of almost 25% and 50% of the original computations are respectively achieved for two widely-adopted application schemes of self-attention. With the proposed optimization adopted, we have correspondingly come up with another two scheduling algorithms. The proposed algorithms are applicable regardless of problem sizes, as long as the number of input vectors is divisible to the number of computing units available in the architecture. Due to the complexity of proving the correctness of the algorithms mathematically for general cases, we have conducted experiments to reveal their validity, together with the superior quality of the solutions provided by which, by solving SAT problems for particular instances.
翻译:过去几年来,在深层次学习领域,特别是在自然语言处理领域,自我关注是闪亮的。它的效果令人印象深刻,再加上无处不在的执行,使我们对将相应计算的数据流有效安排到有多个计算单位实现平行计算的结构中产生了兴趣。在本文中,根据自我关注机制和在语言模型中实现自我关注的最先进的理论,我们提议了一个总体时间安排算法,它来自对小型案例的最佳时间安排,由可视性校验(SAT)解答器解决,以平行地进行典型的自我关注计算。提出了进一步优化对多余计算进行重复计算的战略,同时对两种广泛采用的自控应用计划分别降低了原来的计算率的近25%和50%。在采用拟议的优化后,我们相应地又提出了另外两个时间安排算法。提议的算法是,不管问题大小如何,都适用,只要输入矢量的量与典型的自控计算方法的典型计算方法的典型计算方法相同。我们通过数学模型的精确度来解释其精确度,因此,其精确度的精确度与精确度的精确度的精确度的模型的校正。