NLP systems rarely give special consideration to numbers found in text. This starkly contrasts with the consensus in neuroscience that, in the brain, numbers are represented differently from words. We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods. We break down the subjective notion of numeracy into 7 subtasks, arranged along two dimensions: granularity (exact vs approximate) and units (abstract vs grounded). We analyze the myriad representational choices made by 18 previously published number encoders and decoders. We synthesize best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP, comprised of design trade-offs and a unified evaluation.
翻译:NLP系统很少特别考虑文本中的数字。 这与神经科学中的共识形成鲜明对比,神经科学中的数字代表与字数不同。 我们将最近NLP关于算术的工作安排为任务和方法的综合分类。 我们把算术的主观概念分成7个子任务,分为两个方面:颗粒度(绝对值相对于近似值)和单位(抽象值相对于基准值)。 我们分析了18个先前公布的编码器和解码器所作的各种代表选择。 我们综合了文本中数字代表的最佳做法,并阐述了国家LP的整体算术愿景,包括设计取舍和统一评价。