Modern graphics computing units (GPUs) are designed and optimized to perform highly parallel numerical calculations. This parallelism has enabled (and promises) significant advantages, both in terms of energy performance and calculation. In this document, we take stock of the different applications of mixed precision. We recall the standards currently used in the overwhelming majority of systems in terms of numerical computation. We show that the mixed precision which decreases the precision at the input of an operation does not necessarily decrease the precision of its output. We show that this previous principle allows its transposition into one of the branches that most needs computing power: machine learning. The use of fixed point numbers and half-precision are two very effective ways to increase the learning ability of complex neural networks. Mixed precision still requires the use of suitable hardware, failing which the calculation time could on the contrary be lengthened. The NVIDIA Tensor Core that is found among others in their Tesla V100 range, is an example of implementation at the hardware level of mixed precision. On the other hand, by abandoning the traditional von Neumann model, mixed precision can also be transposed to a lower level of abstraction, using phase change memories.
翻译:现代图形计算单位( GPUs) 设计并优化了现代图形计算单位( GPUs), 以进行高度平行的数字计算。 这种平行性在能源性能和计算两方面都带来了(和承诺)显著的优势。 在本文件中,我们评估了混合精度的不同应用。 我们回顾绝大多数系统目前使用的数值计算标准。 我们显示,降低操作输入精确度的混合精度并不一定降低其输出精确度。 我们显示, 先前的这一原则允许将其转换到最需要计算能力的一个分支: 机器学习。 使用固定点数和半精度是提高复杂神经网络学习能力的两种非常有效的方法。 混合精度仍然需要使用合适的硬件, 而计算时间正好相反会延长。 在Tesla V100 范围中发现 NVIDIA Tensor Core 核心, 是在混合精度硬件水平上执行的一个例子。 另一方面, 放弃传统的 von Neumann 模型, 混合精度也可以转换为低的抽象程度, 使用阶段的记忆。