Survivors of sexual harassment frequently share their experiences on social media, revealing their feelings and emotions and seeking advice. We observed that on Reddit, survivors regularly share long posts that describe a combination of (i) a sexual harassment incident, (ii) its effect on the survivor, including their feelings and emotions, and (iii) the advice being sought. We term such posts MeToo posts, even though they may not be so tagged and may appear in diverse subreddits. A prospective helper (such as a counselor or even a casual reader) must understand a survivor's needs from such posts. But long posts can be time-consuming to read and respond to. Accordingly, we address the problem of extracting key information from a long MeToo post. We develop a natural language-based model to identify sentences from a post that describe any of the above three categories. On ten-fold cross-validation of a dataset, our model achieves a macro F1 score of 0.82. In addition, we contribute MeThree, a dataset comprising 8,947 labeled sentences extracted from Reddit posts. We apply the LIWC-22 toolkit on MeThree to understand how different language patterns in sentences of the three categories can reveal differences in emotional tone, authenticity, and other aspects.
翻译:幸存者经常在社交媒体上分享他们的性骚扰经历,揭示他们的感受和情感,并寻求建议。我们观察到在Reddit上,幸存者经常分享长篇帖子,描述了(i)性骚扰事件,(ii)它对幸存者的影响,包括他们的感受和情感,以及(iii)正在寻求的建议。我们将这样的帖子称为MeToo帖子,即使它们可能没有被标记,并且可能出现在不同的子reddit中。一位潜在的援助者(如辅导员甚至是普通读者)必须从这些帖子中理解幸存者的需求。但是,长篇帖子阅读和回复起来耗时。因此,我们解决了从长篇MeToo帖子中提取关键信息的问题。我们开发了一种基于自然语言的模型,以识别描述任何这三类之一的帖子句子。在一个数据集的十倍交叉验证中,我们的模型达到了0.82的宏F1分数。此外,我们贡献了MeThree,一个包含从Reddit帖子中提取的8,947个已标记句子的数据集。我们将LIWC-22工具包应用于MeThree,以了解三种类别中语句的不同语言模式如何揭示情感色彩,真实性和其他方面的差异。