NumGPT: 提高预培训模式的创造能力 (NumGPT: Improving Numeracy Ability of Generative Pre-trained Models) - 专知论文

会员服务 ·

0

MoDELS · Extensibility · 损失函数（机器学习） · 语言模型化 · Performer ·

2021 年 9 月 7 日

NumGPT: Improving Numeracy Ability of Generative Pre-trained Models

翻译：NumGPT: 提高预培训模式的创造能力

Zhihua Jin,Xin Jiang,Xingbo Wang,Qun Liu,Yong Wang,Xiaozhe Ren,Huamin Qu

from arxiv, 8 pages, 3 figures

Existing generative pre-trained language models (e.g., GPT) focus on modeling the language structure and semantics of general texts. However, those models do not consider the numerical properties of numbers and cannot perform robustly on numerical reasoning tasks (e.g., math word problems and measurement estimation). In this paper, we propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts. Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number. A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT. We conduct extensive experiments on four different datasets to evaluate the numeracy ability of NumGPT. The experiment results show that NumGPT outperforms baseline models (e.g., GPT and GPT with DICE) on a range of numerical reasoning tasks such as measurement estimation, number comparison, math word problems, and magnitude classification. Ablation studies are also conducted to evaluate the impact of pre-training and model hyperparameters on the performance.

翻译：在本文中,我们提议NumGPT, 这是一种基因化的预培训前的模型,明确模拟文本中数字的数值属性。具体地说,它利用一个原型基于数字的嵌入点来编码数字的曼蒂萨,并用个人嵌入来编码数字的缩略图。一个数字认知损失功能旨在将数字纳入NumGPT的培训前目标中。我们还对四个不同的数据集进行了广泛的实验,以评估NumGPT的算术能力。实验结果显示,NumGPT在一系列数字推理任务方面,如计量估计、数字比较、数学词问题和规模分类,比基准模型(例如,GPT和GPT与DICE)更符合基准模型。

0

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【斯坦福&Facebook】生成式对抗变换器，Generative Adversarial Transformers

专知会员服务

21+阅读 · 2021年4月21日

【机器学习傻瓜式入门，443页pdf】Machine Learning For Dummies, 2nd Edition

【机器学习傻瓜式入门，443页pdf】Machine Learning For Dummies, 2nd Edition

专知会员服务

71+阅读 · 2021年1月26日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

最新【深度生成模型】Deep Generative Models，104页ppt

最新【深度生成模型】Deep Generative Models，104页ppt

专知会员服务

71+阅读 · 2020年10月24日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

6+阅读 · 2018年12月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Foundations of Symbolic Languages for Model Interpretability

Arxiv

7+阅读 · 2021年10月5日

Causality and Generalizability: Identifiability and Learning Methods

Arxiv

12+阅读 · 2021年10月4日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding

Arxiv

3+阅读 · 2021年7月1日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Do NLP Models Know Numbers? Probing Numeracy in Embeddings

Arxiv

5+阅读 · 2019年9月17日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

Expeditious Generation of Knowledge Graph Embeddings

Arxiv

7+阅读 · 2018年3月21日

VIP会员

文章信息

相关主题

损失函数（机器学习）

语言模型化

相关VIP内容

【斯坦福&Facebook】生成式对抗变换器，Generative Adversarial Transformers

专知会员服务

21+阅读 · 2021年4月21日

【机器学习傻瓜式入门，443页pdf】Machine Learning For Dummies, 2nd Edition

【机器学习傻瓜式入门，443页pdf】Machine Learning For Dummies, 2nd Edition

专知会员服务

71+阅读 · 2021年1月26日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

最新【深度生成模型】Deep Generative Models，104页ppt

最新【深度生成模型】Deep Generative Models，104页ppt

专知会员服务

71+阅读 · 2020年10月24日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《在单一作战合成环境（SSE）中运用人工智能与大型语言模型以提供灵活人文地形及可信角色组》报告

《俄罗斯的未来战争方式第二部分：核威慑》报告

《提示战争：大语言模型如何决定军事干预》报告

《俄罗斯的未来战争方式第三部分：军事改革》报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

6+阅读 · 2018年12月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Foundations of Symbolic Languages for Model Interpretability

Arxiv

7+阅读 · 2021年10月5日

Causality and Generalizability: Identifiability and Learning Methods

Arxiv

12+阅读 · 2021年10月4日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding

Arxiv

3+阅读 · 2021年7月1日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Do NLP Models Know Numbers? Probing Numeracy in Embeddings

Arxiv

5+阅读 · 2019年9月17日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

Expeditious Generation of Knowledge Graph Embeddings

Arxiv

7+阅读 · 2018年3月21日

微信扫码咨询专知VIP会员