Most applications of transformers to mathematics, from integration to theorem proving, focus on symbolic computation. In this paper, we show that transformers can be trained to perform numerical calculations with high accuracy. We consider problems of linear algebra: matrix transposition, addition, multiplication, eigenvalues and vectors, singular value decomposition, and inversion. Training small transformers (up to six layers) over datasets of random matrices, we achieve high accuracies (over 90%) on all problems. We also show that trained models can generalize out of their training distribution, and that out-of-domain accuracy can be greatly improved by working from more diverse datasets (in particular, by training from matrices with non-independent and identically distributed coefficients). Finally, we show that few-shot learning can be leveraged to re-train models to solve larger problems.
翻译:变压器大部分应用到数学, 从集成到理论验证, 重点是符号计算。 在本文中, 我们显示变压器可以接受高精确度计算数字的培训。 我们考虑线形代数的问题: 矩阵变异、 添加、 倍增、 倍增、 电子值和矢量、 单值分解和反转。 培训小变压器( 最多六层) 超过随机矩阵数据集, 我们在所有问题上都实现了高通俗( 90%以上 ) 。 我们还显示, 受过训练的模型可以概括其培训分布, 并且通过使用更多样化的数据集( 特别是从不独立且分布相同的系数的矩阵上进行培训) 来大大改进外部精确性。 最后, 我们显示, 少发式的学习可以被运用到再培训模型中去解决更大的问题。