Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit integers, but typically relies on discretising real-valued updates. We introduce an alternative approach where the update rule itself is discrete, avoiding the quantisation of continuous updates by design. We establish convergence guarantees for a general class of such discrete schemes, and present a multinomial update rule as a concrete example, supported by empirical evaluation. This perspective opens new avenues for efficient training, particularly for models with inherently discrete structure.
翻译:现代深度学习模型需要巨大的计算资源,这推动了对低精度训练的研究。量化训练通过使用低比特整数表示训练组件来解决这一问题,但通常依赖于对实值更新的离散化。我们提出了一种替代方法,其中更新规则本身是离散的,从而从设计上避免了连续更新的量化。我们为这类离散方案的一般类别建立了收敛性保证,并以多项式更新规则作为具体示例,并辅以实证评估。这一视角为高效训练开辟了新途径,尤其适用于具有固有离散结构的模型。