Recent advances in neural network language models have shown that it is possible to derive expressive meaning representations by leveraging linguistic associations in large-scale natural language data. These potentially Gestalt representations have enabled state-of-the-art performance for many practical applications. It would appear that we are on a pathway to empirically deriving a robust and expressive computable semantics. A key question that arises is how far can language data alone enable computers to understand the necessary truth about the physical world? Attention to this question is warranted because our future interactions with intelligent machines depends on how well our techniques correctly represent and process the concepts (objects, properties, and processes) that humans commonly observe to be true. After reviewing existing protocols, the objective of this work is to explore this question using a novel and tightly controlled reasoning test and to highlight what models might learn directly from pure linguistic data.
翻译:神经网络语言模型的最新进展表明,通过在大规模自然语言数据中利用语言联系来获得表达意义,是有可能的。这些潜在的Gestalt表达方式使得许多实际应用能够取得最先进的性能。看来我们正走上了从经验中得出一个强大和直截了当的可比较语义的途径。所产生的一个关键问题是,单靠语言数据,计算机就能在多大程度上了解物理世界的必要真相?有必要关注这一问题,因为我们与智能机器的未来互动取决于我们的技术如何正确代表并处理人类通常观察到的真实概念(对象、属性和程序 ) 。在审查现有协议之后,这项工作的目标是利用新颖和严格控制的推理测试来探索这一问题,并突出一些模型可以直接从纯语言数据中学习什么。