Low-precision arithmetic has had a transformative effect on the training of neural networks, reducing computation, memory and energy requirements. However, despite its promise, low-precision arithmetic has received little attention for Gaussian processes (GPs), largely because GPs require sophisticated linear algebra routines that are unstable in low-precision. We study the different failure modes that can occur when training GPs in half precision. To circumvent these failure modes, we propose a multi-faceted approach involving conjugate gradients with re-orthogonalization, mixed precision, and preconditioning. Our approach significantly improves the numerical stability and practical performance of conjugate gradients in low-precision over a wide range of settings, enabling GPs to train on $1.8$ million data points in $10$ hours on a single GPU, without any sparse approximations.
翻译:低精度算术对神经网络的培训、减少计算、记忆和能源需求产生了变革性影响,然而,尽管它很有希望,低精度算术却很少受到高斯过程的注意,这主要是因为高斯过程需要精密的线性代数常规,这种常规在低精度方面不稳定。我们研究在以半精确度培训GP时可能出现的不同故障模式。为了绕过这些失败模式,我们建议采用多面方法,将梯子与重新对齐、混杂精确度和先决条件等同起来。我们的方法大大改善了低精度同位梯子在广泛环境中的数字稳定性和实际性能,使GP能够在单一的GPU上以1 000美元小时的速度培训180万美元的数据点,而没有任何稀疏的近点。