Deep learning has grown rapidly thanks to its state-of-the-art performance across a wide range of real-world applications. While neural networks have been trained using IEEE-754 binary32 arithmetic, the rapid growth of computational demands in deep learning has boosted interest in faster, low precision training. Mixed-precision training that combines IEEE-754 binary16 with IEEE-754 binary32 has been tried, and other $16$-bit formats, for example Google's bfloat16, have become popular. In floating-point arithmetic there is a tradeoff between precision and representation range as the number of exponent bits changes; denormal numbers extend the representation range. This raises questions of how much exponent range is needed, of whether there is a format between binary16 (5 exponent bits) and bfloat16 (8 exponent bits) that works better than either of them, and whether or not denormals are necessary. In the current paper we study the need for denormal numbers for mixed-precision training, and we propose a 1/6/9 format, i.e., 6-bit exponent and 9-bit explicit mantissa, that offers a better range-precision tradeoff. We show that 1/6/9 mixed-precision training is able to speed up training on hardware that incurs a performance slowdown on denormal operations or eliminates the need for denormal numbers altogether. And, for a number of fully connected and convolutional neural networks in computer vision and natural language processing, 1/6/9 achieves numerical parity to standard mixed-precision.
翻译:深层次的学习由于其在一系列现实世界应用中的最先进的表现而迅速增长。虽然神经网络已经用IEE-754双轨32算术进行了训练,但深层次学习的计算要求的迅速增长提高了人们对更快、低精度培训的兴趣。混合精密培训已经尝试过,将IEE-754双轨16与IEE-754双轨16结合在一起,其他16美元比特格式,例如谷歌的bfloat16已经变得流行。在浮动点算术中,精确度和代表值之间的权衡范围是:超前位位位位位位位位位位数的变化;异常数扩大了表达范围。这提出了需要多少超前位数的问题,即二进位数(5倍)和bfloat16(8倍出位位数)之间是否有一种比两者都更好的格式,以及是否有必要完全的不正态。在目前的论文中,我们研究混合精度的精确度和正轨的计算,我们提议在1/6/9级的运行中要达到一个更精确的直径直径直径。