This paper introduces SocialVec, a general framework for eliciting social world knowledge from social networks, and applies this framework to Twitter. SocialVec learns low-dimensional embeddings of popular accounts, which represent entities of general interest, based on their co-occurrences patterns within the accounts followed by individual users, thus modeling entity similarity in socio-demographic terms. Similar to word embeddings, which facilitate tasks that involve text processing, we expect social entity embeddings to benefit tasks of social flavor. We have learned social embeddings for roughly 200,000 popular accounts from a sample of the Twitter network that includes more than 1.3 million users and the accounts that they follow, and evaluate the resulting embeddings on two different tasks. The first task involves the automatic inference of personal traits of users from their social media profiles. In another study, we exploit SocialVec embeddings for gauging the political bias of news sources in Twitter. In both cases, we prove SocialVec embeddings to be advantageous compared with existing entity embedding schemes. We will make the SocialVec entity embeddings publicly available to support further exploration of social world knowledge as reflected in Twitter.
翻译:本文介绍社会之声,这是从社会网络中获取社会世界知识的一般框架,并将这一框架应用于Twitter。社会之声根据个人用户所遵循的账户中共同发生的模式,学习了大众账户的低维化嵌入,这些账户代表了具有普遍利益的实体,从而在社会-人口学方面树立了实体的模范。类似于文字嵌入,它有助于处理文字处理,我们期望社会实体嵌入有利于社会口味的任务。我们从包括130多万用户及其后续账户的Twitter网络样本中学习了大约20万个流行账户的社会嵌入。我们的第一个任务是从个人用户的社会媒体概况中自动推断出用户的个人特征。在另一项研究中,我们利用“社会视频嵌入”来勾画Twitter新闻来源的政治偏见。在这两个案例中,我们证明社会视频嵌入与现有实体嵌入计划相比是有利的。我们将社会 Vec实体嵌入在两个不同的任务上,以公开嵌入的方式支持进一步探索反映的社会知识的Twitter。