NumGPT: 提高预培训模式的创造能力 (NumGPT: Improving Numeracy Ability of Generative Pre-trained Models)

Existing generative pre-trained language models (e.g., GPT) focus on modeling the language structure and semantics of general texts. However, those models do not consider the numerical properties of numbers and cannot perform robustly on numerical reasoning tasks (e.g., math word problems and measurement estimation). In this paper, we propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts. Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number. A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT. We conduct extensive experiments on four different datasets to evaluate the numeracy ability of NumGPT. The experiment results show that NumGPT outperforms baseline models (e.g., GPT and GPT with DICE) on a range of numerical reasoning tasks such as measurement estimation, number comparison, math word problems, and magnitude classification. Ablation studies are also conducted to evaluate the impact of pre-training and model hyperparameters on the performance.

翻译：在本文中,我们提议NumGPT, 这是一种基因化的预培训前的模型,明确模拟文本中数字的数值属性。具体地说,它利用一个原型基于数字的嵌入点来编码数字的曼蒂萨,并用个人嵌入来编码数字的缩略图。一个数字认知损失功能旨在将数字纳入NumGPT的培训前目标中。我们还对四个不同的数据集进行了广泛的实验,以评估NumGPT的算术能力。实验结果显示,NumGPT在一系列数字推理任务方面,如计量估计、数字比较、数学词问题和规模分类,比基准模型(例如,GPT和GPT与DICE)更符合基准模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

最新《弱监督预训练语言模型微调》报告，52页ppt

专知会员服务

38+阅读 · 2020年12月26日