We revisit non-blocking simultaneous multithreading (NB-SMT) introduced previously by Shomron and Weiser (2020). NB-SMT trades accuracy for performance by occasionally "squeezing" more than one thread into a shared multiply-and-accumulate (MAC) unit. However, the method of accommodating more than one thread in a shared MAC unit may contribute noise to the computations, thereby changing the internal statistics of the model. We show that substantial model performance can be recouped by post-training recalibration of the batch normalization layers' running mean and running variance statistics, given the presence of NB-SMT.
翻译:我们重新审视了Shomron和Weiser(2020年)以前引入的不阻塞同时同时进行多读(NB-SMT)的做法。NB-SMT通过偶尔“挤压”一个以上的线来交换性能准确性能,将一个以上的线性能“挤压”到一个共同的乘积(MAC)单位。然而,在共用的MAC单元中容纳一个以上线性能的方法可能会给计算工作造成噪音,从而改变模型的内部统计。我们表明,鉴于NB-SMT的存在,通过在培训后重新校准分批正常层运行的中值和运行差异统计,可以回收大量模型性能。