带有文字嵌入文字的文化制图 (Cultural Cartography with Word Embeddings)

Using the frequency of keywords is a classic approach in the formal analysis of text, but has the drawback of glossing over the relationality of word meanings. Word embedding models overcome this problem by constructing a standardized and continuous "meaning space" where words are assigned a location based on relations of similarity to other words based on how they are used in natural language samples. We show how word embeddings are commensurate with prevailing theories of meaning in sociology and can be put to the task of interpretation via two kinds of navigation. First, one can hold terms constant and measure how the embedding space moves around them--much like astronomers measured the changing of celestial bodies with the seasons. Second, one can also hold the embedding space constant and see how documents or authors move relative to it--just as ships use the stars on a given night to determine their location. Using the empirical case of immigration discourse in the United States, we demonstrate the merits of these two broad strategies for advancing important topics in cultural theory, including social marking, media fields, echo chambers, and cultural diffusion and change more broadly.

翻译：使用关键词的频率是正式分析文本的经典方法,但对于字义含义的关联性来说,使用关键词的频率是一个典型的方法,但有一个缺点,就是模糊了字义含义的关联性。字嵌入模型通过构建一个标准化和连续的“意味着空间”克服了这个问题, 在这种空间中,根据语言在自然语言样本中如何使用与其他字词的关系分配一个类似的位置。我们展示了字嵌入字词如何与社会学中普遍存在的意义理论相对应,并且可以通过两种导航方式进行解释。首先,人们可以持有不变的术语,并衡量空间嵌入空间如何在它们周围移动,就像天文学家测量天体随着季节的变化。其次,人们还可以保持嵌入空间常数,看文件或作者如何在某一晚上利用恒星确定它们的位置时与它相对地移动。我们利用美国移民讨论的经验案例,展示了这两种广泛战略在文化理论中推进重要课题的优点,包括社会标记、媒体领域、回声室以及文化传播和更广泛的变化。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。