预先训练的语言模型也是符号数学解答器!</s> (Pretrained Language Models are Symbolic Mathematics Solvers too!)

Solving symbolic mathematics has always been of in the arena of human ingenuity that needs compositional reasoning and recurrence. However, recent studies have shown that large-scale language models such as transformers are universal and surprisingly can be trained as a sequence-to-sequence task to solve complex mathematical equations. These large transformer models need humongous amounts of training data to generalize to unseen symbolic mathematics problems. In this paper, we present a sample efficient way of solving the symbolic tasks by first pretraining the transformer model with language translation and then fine-tuning the pretrained transformer model to solve the downstream task of symbolic mathematics. We achieve comparable accuracy on the integration task with our pretrained model while using around $1.5$ orders of magnitude less number of training samples with respect to the state-of-the-art deep learning for symbolic mathematics. The test accuracy on differential equation tasks is considerably lower comparing with integration as they need higher order recursions that are not present in language translations. We propose the generalizability of our pretrained language model from Anna Karenina Principle (AKP). We pretrain our model with different pairs of language translations. Our results show language bias in solving symbolic mathematics tasks. Finally, we study the robustness of the fine-tuned model on symbolic math tasks against distribution shift, and our approach generalizes better in distribution shift scenarios for the function integration.

翻译：解决符号数学始终存在于人类智慧的领域中,需要进行构思推理和重现。然而,最近的研究表明,大型语言模型,如变压器等,是通用的,令人惊讶地可以作为解决复杂数学方程的顺序和顺序任务来培训。这些大型变压器模型需要大量的培训数据,以便概括到看不见的象征性数学问题。在本文件中,我们提出了一个解决象征性任务的样本,先先用语言翻译对变压器模型进行初步培训,然后对预先训练的变压器模型进行微调,以解决符号数学的下游任务。我们在整合任务方面实现了与我们预先训练的变压模型的相似的准确性,同时使用约1.5美元数量较少的培训样本解决复杂的数学方程问题。这些变压式模型的测试准确性要远远低于整合,因为它们需要更高顺序的重复,而语言翻译中则没有出现。我们建议从安娜·卡列尼纳原则(AKP)开始,先行将我们预先训练的语言模型的通用性模型与我们预先训练过的模拟语言翻译的不同组合的精确性。我们最终在数学的数学上展示了数学分布上的精确性分析,在数学上展示了我们数学的数学的精确性分析。最后,在数学上展示了我们数学的数学的数学的数学分布,我们对等的精确性研究中,对等的数学的精确性研究。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日