Conversations on social media (SM) are increasingly being used to investigate social issues on the web, such as online harassment and rumor spread. For such issues, a common thread of research uses adversarial reactions, e.g., replies pointing out factual inaccuracies in rumors. Though adversarial reactions are prevalent in online conversations, inferring those adverse views (or stance) from the text in replies is difficult and requires complex natural language processing (NLP) models. Moreover, conventional NLP models for stance mining need labeled data for supervised learning. Getting labeled conversations can itself be challenging as conversations can be on any topic, and topics change over time. These challenges make learning the stance a difficult NLP problem. In this research, we first create a new stance dataset comprised of three different topics by labeling both users' opinions on the topics (as in pro/con) and users' stance while replying to others' posts (as in favor/oppose). As we find limitations with supervised approaches, we propose a weakly-supervised approach to predict the stance in Twitter replies. Our novel method allows using a smaller number of hashtags to generate weak labels for Twitter replies. Compared to supervised learning, our method improves the mean F1-macro by 8\% on the hand-labeled dataset without using any hand-labeled examples in the training set. We further show the applicability of our proposed method on COVID 19 related conversations on Twitter.
翻译:社交媒体(SM)上的争论越来越多地被用来调查网上的社会问题,例如在线骚扰和流言散布。对于这些问题,一个共同的研究线索使用对抗性反应,例如答复指出流言中的事实不准确。虽然在线对话中普遍存在对抗性反应,但从答复中推断出文本中的负面观点(或立场)是困难的,需要复杂的自然语言处理模式。此外,传统NLP(立场采矿模式)需要标签标签的推特处理方法(NLP)来监督学习。获取标签式对话本身可能具有挑战性,因为任何议题上的谈话都可能具有挑战性,而且议题会随时间变化。这些挑战使得学习立场成为NLP的一个困难问题。在这项研究中,我们首先创建由三个不同主题组成的新立场数据集,将用户对主题(如赞成/反对)的意见和用户的立场贴上标签(赞成/反对/反对),同时对其他人的语种(建议/反对)处理。此外,我们发现监督式方法存在局限性,因此我们提议采用一个薄弱的超级方法来预测Twitter答复的立场。我们的新方法允许使用一个小数字标签,不用监管的标签方法,用监管的标签方法来显示标签标签标签。