预先训练的语言模型也是符号数学解答器! (Pretrained Language Models are Symbolic Mathematics Solvers too!)

Solving symbolic mathematics has always been of in the arena of human ingenuity that needs compositional reasoning and recurrence. However, recent studies have shown that large-scale language models such as transformers are universal and surprisingly can be trained as a sequence-to-sequence task to solve complex mathematical equations. These large transformer models need humongous amounts of training data to generalize to unseen symbolic mathematics problems. In this paper, we present a sample efficient way of solving the symbolic tasks by first pretraining the transformer model with language translation and then fine-tuning the pretrained transformer model to solve the downstream task of symbolic mathematics. We achieve comparable accuracy on the integration task with our pretrained model while using around $1.5$ orders of magnitude less number of training samples with respect to the state-of-the-art deep learning for symbolic mathematics. The test accuracy on differential equation tasks is considerably lower comparing with integration as they need higher order recursions that are not present in language translations. We pretrain our model with different pairs of language translations. Our results show language bias in solving symbolic mathematics tasks. Finally, we study the robustness of the fine-tuned model on symbolic math tasks against distribution shift, and our approach generalizes better in distribution shift scenarios for the function integration.

翻译：解决符号数学一直存在于人类的智慧领域,需要进行构思推理和重现。然而,最近的研究表明,大型语言模型,如变压器等大规模语言模型是普遍性的,令人惊讶地可以作为解决复杂数学方程的顺序到顺序的任务来培训。这些大型变压器模型需要大量培训数据,以概括到看不见的象征性数学问题。在本文件中,我们提出了一个解决象征性任务的样本,先先用语言翻译对变压器模型进行初步培训,然后对预先训练的变压器模型进行微调,以便解决符号数学的下游任务。我们在整合任务方面实现与我们预先训练的模型的相似的准确性,同时使用约1.5美元数量较少的培训样本解决复杂的数学方程。这些变压式模型的测试准确性要比整合要低得多,因为它们需要更高顺序的重复,而语言翻译中却没有出现。我们先用不同的语言翻译对模型进行预选。我们的结果显示在解决符号数学任务的下游工作中的语言偏差。最后,我们研究了在符号数学整合模式的配置方面,我们改进了模式的转变模式的分布方式,我们改进了符号数学分配方式的转变。