Transformers can learn to perform numerical computations from examples only. I study nine problems of linear algebra, from basic matrix operations to eigenvalue decomposition and inversion, and introduce and discuss four encoding schemes to represent real numbers. On all problems, transformers trained on sets of random matrices achieve high accuracies (over 90%). The models are robust to noise, and can generalize out of their training distribution. In particular, models trained to predict Laplace-distributed eigenvalues generalize to different classes of matrices: Wigner matrices or matrices with positive eigenvalues. The reverse is not true.
翻译:变异器只能从示例中学习计算数字。 我研究线性代数的九个问题, 从基本矩阵操作到乙基值分解和反转, 并引入和讨论四个编码方案以代表真实数字。 在所有问题上, 随机矩阵组合的变异器都达到高通缩度( 超过90% ) 。 这些模型对噪音非常强大, 可以概括其培训分布。 特别是, 被训练用来预测 Laplace 分配的乙基值的模型一般分为不同的矩阵类别: Wigner 矩阵或具有正乙基值的矩阵。 反向是不是真的 。