Social media is considered a democratic space in which people connect and interact with each other regardless of their gender, race, or any other demographic aspect. Despite numerous efforts that explore demographic aspects in social media, it is still unclear whether social media perpetuates old inequalities from the offline world. In this dissertation, we attempt to identify gender and race of Twitter users located in the United States using advanced image processing algorithms from Face++. We investigate how different demographic groups connect with each other and differentiate them regarding linguistic styles and also their interests. We quantify to what extent one group follows and interacts with each other and the extent to which these connections and interactions reflect in inequalities in Twitter. We also extract linguistic features from six categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus) in order to identify the similarities and the differences in the messages they share in Twitter. Furthermore, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we use the topics of interest that we retrieve from each user. Our analysis shows that users identified as white and male tend to attain higher positions, in terms of the number of followers and number of times in another user's lists, in Twitter. There are clear differences in the way of writing across different demographic groups in both gender and race domains as well as in the topic of interest. We hope our effort can stimulate the development of new theories of demographic information in the online space. Finally, we developed a Web-based system that leverages the demographic aspects of users to provide transparency to the Twitter trending topics system.
翻译:社会媒体被视为一种民主空间,人们在其中相互联系和互动,而不论其性别、种族或任何其他人口方面如何。尽管在社交媒体中进行了许多努力,探索了人口因素,但我们仍不清楚社交媒体是否延续了离线世界的旧不平等。在这份论文中,我们试图利用Face++的高级图像处理算法来识别位于美国的推特用户的性别和种族。我们调查不同的人口群体如何相互联系,在语言风格和他们的利益方面对他们加以区分。我们量化了一个群体在多大程度上跟踪和互动,以及这些联系和互动在推特中反映不平等的程度。我们还从六个类别(情感特征、认知特征、词汇密度和认识、时间参照、社会和个人关切以及人际焦点)中提取了语言特征。我们试图找出位于美国的Twitter用户的性别与种族差异。此外,我们利用每个用户感兴趣的话题,我们从每个用户身上获取的话题。我们的分析显示,在互联网用户中发现,白男性关系和男性关系和互动关系在互联网上的位置都达到了更高的位置,从时间范围上看,从另一个种族和性别趋势上看,我们又看,在网络上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,数字上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,在数字上,当然上,当然上,在数字上,当然上,我们。