Numbers are essential components of text, like any other word tokens, from which natural language processing (NLP) models are built and deployed. Though numbers are typically not accounted for distinctly in most NLP tasks, there is still an underlying amount of numeracy already exhibited by NLP models. In this work, we attempt to tap this potential of state-of-the-art NLP models and transfer their ability to boost performance in related tasks. Our proposed classification of numbers into entities helps NLP models perform well on several tasks, including a handcrafted Fill-In-The-Blank (FITB) task and on question answering using joint embeddings, outperforming the BERT and RoBERTa baseline classification.
翻译:数字是文本的必要组成部分,像任何其他文字符号一样,是建立和部署自然语言处理(NLP)模型的基本组成部分。虽然数字通常在大多数自然语言处理(NLP)任务中并不明确计算,但国家语言处理(NLP)模型已经展示了基本的算术数量。在这项工作中,我们试图挖掘最先进的国家语言处理(NLP)模型的这一潜力,并转让其提高相关任务绩效的能力。我们拟将数字分类为实体,有助于国家语言处理(NLP)模型在几项任务上表现良好,包括手工制作的填充(FITB)任务,以及使用联合嵌入(BERT和ROBERTA)基线分类回答问题。