Advice forums are a crowdsourced way to reinforce cultural norms and moral behavior. Sites like Reddit contain massive amounts of natural language human interaction, with rules and norms unique to each individual subreddit community. To explore this data, we created a dataset with top 1000 posts from each of two such forums, r/AmItheAsshole and r/relationships, and extracted natural language features including sentiment, similarity, word frequency, and demographics using both algorithmic and manual methods. Further, we developed a method to extract demographic information from the subreddits, examined how the post authors' self-disclosures reflect the unique communities in which their posts are shared, and discussed how the authors' language use choices might be related to broader social patterns. We observed some differences between the subreddits in terms of word frequency, demographics disclosure, and gendered language. In general, both subreddits had more female posters than male, and posters tended to use more words about their opposite gender than the same. Gender-diverse posters were uncommon. Implications for future research include a more careful, inclusive focus on identity and disclosure and how that interacts with advice-seeking behavior in online communities.
翻译:咨询论坛是加强文化规范和道德行为的一种多方联动方式。Reddit等网站包含大量自然语言的人类互动,每个子编辑社区都有独特的规则和规范。为了探索这一数据,我们创建了一个数据集,由两个这样的论坛中每个论坛的1000个最高职位组成,包括r/AmItheAshole和r/关系,并提取自然语言特征,包括情感、相似性、字数频率和人口特征,同时使用算法和人工方法。此外,我们开发了一种方法,从子编辑中提取人口信息,研究后作者的自我披露如何反映其职位共享的独特社区,并讨论作者的语言选择可能如何与更广泛的社会模式相联系。我们观察到了两个子编辑在文字频率、人口披露和性别语言方面的一些差异。一般而言,两个子编辑的海报中女性海报多于男性,而海报往往使用与其相反的性别的词句。性别相较之不同。性别偏向的海报是罕见的。对未来研究的影响包括更仔细、包容性地关注身份和披露以及如何与在线咨询行为的互动。