Cultural areas represent a useful concept that cross-fertilizes diverse fields in social sciences. Knowledge of how humans organize and relate their ideas and behavior within a society helps to understand their actions and attitudes towards different issues. However, the selection of common traits that shape a cultural area is somewhat arbitrary. What is needed is a method that can leverage the massive amounts of data coming online, especially through social media, to identify cultural regions without ad-hoc assumptions, biases or prejudices. In this work, we take a crucial step towards this direction by introducing a method to infer cultural regions based on the automatic analysis of large datasets from microblogging posts. Our approach is based on the principle that cultural affiliation can be inferred from the topics that people discuss among themselves. Specifically, we measure regional variations in written discourse generated in American social media. From the frequency distributions of content words in geotagged Tweets, we find the words' usage regional hotspots, and from there we derive principal components of regional variation. Through a hierarchical clustering of the data in this lower-dimensional space, our method yields clear cultural areas and the topics of discussion that define them. We obtain a manifest North-South separation, which is primarily influenced by the African American culture, and further contiguous (East-West) and non-contiguous (urban-rural) divisions that provide a comprehensive picture of today's cultural areas in the US.
翻译:文化领域代表着一种有用的概念,它使社会科学的各个领域相互交织; 了解人类如何在社会中组织和交流其思想和行为,有助于理解他们对于不同问题的行动和态度; 然而,选择形成文化领域的共同特征有些武断。 我们需要一种方法,能够利用大量在线数据,特别是通过社交媒体提供的数据,确定没有特别假设、偏见或偏见的文化区域; 在这项工作中,我们朝着这一方向迈出了关键的一步,根据对微博站大型数据集的自动分析,对文化区域进行推论。 我们的方法基于的原则是,可以从人们之间讨论的主题中推断文化归属。 具体地说,我们用美国社会媒体生成的书面言论衡量区域差异。 从地理连接的Tweets中频频传播内容,我们发现“使用区域热点”,我们从中得出区域差异的主要组成部分。 通过在这个较低维度的空间对数据进行分级组合,我们的方法提供了清晰的文化领域和今天界定这些差异的讨论主题。 我们的方法基于以下原则,即从文化联系中推断出文化属性,我们从他们自己讨论的主题中,我们从美国社会媒体的文字上,从地理上看,从一个明显的南北分块,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中,从中