The power of word embeddings is attributed to the linguistic theory that similar words will appear in similar contexts. This idea is specifically invoked by noting that "you shall know a word by the company it keeps," a quote from British linguist J.R. Firth who, along with his American colleague Zellig Harris, is often credited with the invention of "distributional semantics." While both Firth and Harris are cited in all major NLP textbooks and many foundational papers, the content and differences between their theories is seldom discussed. Engaging in a close reading of their work, we discover two distinct and in many ways divergent theories of meaning. One focuses exclusively on the internal workings of linguistic forms, while the other invites us to consider words in new company - not just with other linguistic elements, but also in a broader cultural and situational context. Contrasting these theories from the perspective of current debates in NLP, we discover in Firth a figure who could guide the field towards a more culturally grounded notion of semantics. We consider how an expanded notion of "context" might be modeled in practice through two different strategies: comparative stratification and syntagmatic extension
翻译:嵌入字的力量归结于语言学理论, 类似字词会出现在类似背景中。 这个概念被具体引用, 指出“你应该知道它所保留公司的一个词 ” 英国语言学家J.R. Firth的一段引文, 他和他的美国同事Zellig Harris 经常被归功于“分布语义”的发明。 虽然法尔特和哈里斯在所有主要的《国家语言计划》教科书和许多基础论文中都引用了这两个理论, 但很少讨论它们理论的内容和差异。 仔细阅读它们的工作, 我们发现两个截然不同的、 在许多方面不同的含义理论。 一个专门关注语言形式的内部工作, 而另一个则邀请我们考虑新公司中的词汇, 不仅与其他语言元素, 而且在更广泛的文化和环境背景下。 从目前《国家语言计划》中的辩论角度来对比这些理论, 我们在《国家语言计划》中发现一个可以指导这个领域走向更具有文化基础的语义概念的人物。 我们考虑一个扩大的“文化扩展”概念如何通过两种不同的战略在实践中建模。